Thanks for the FLIP, nice addition! A few questions before the vote: 1. Interaction with the retry strategy: the FLIP doesn't say where the new timeout hook sits in that flow, worth spelling out explicitly.
2. "CompletableFuture must be completed synchronously." Is this a javadoc convention or a runtime guarantee? If it's only a convention, it would help to document what happens when users break it. If it's enforced, a brief note would be useful. 3. Failure modes of the user timeout method: two cases aren't covered: The method itself throws, do we swallow and degrade the Exception, or fail the job? The method never completes (bug or hang), does the operator stall indefinitely? 4. On the public interface: since timeout's parameter list mirrors eval, how is a var-args eval (e.g. eval(CompletableFuture<T>, String...)) expected to be matched? 5. How has AsyncScalarFunction been considered here? It shares the same timeout-prone remote-call pattern, so it seems natural to extend the same mechanism there as well, is that in scope, a follow-up, or intentionally left out? Best, Lincoln Lee Gen Luo <[email protected]> 于2026年5月26日周二 11:02写道: > Thanks for driving this proposal forward! This addresses a real pain point > we've been hearing about for a while. > > Many of our users rely on AsyncTableFunction or Lookup Join to implement > custom external service calls and data fetching, typically for RAG or LLM > inference scenarios. Due to the inherent instability of these external > services, timeouts occur occasionally, and users want to apply fallback > strategies (e.g., falling back to a local rule-based model) rather than > failing the entire job. However, this hasn't been achievable so far — the > hard-coded TimeoutException behavior introduces stability risks, forcing > users to keep increasing the timeout value to absurd levels and work around > the issue in various hacky ways. Worse, each user tends to hit this pitfall > independently before realizing the limitation. > > Adding a timeout interface not only addresses this pain point, but also > aligns the API contract between AsyncTableFunction and AsyncFunction, > avoiding unnecessary confusion for users. > > Big +1 from our side — looking forward to seeing this land. > > On Mon, May 25, 2026 at 7:53 PM Xia Sun <[email protected]> wrote: > > > Hi Kui.Yuan, > > > > Thanks for driving this! > > > > In our production practice, the asynchronous I/O capability of > > AsyncTableFunction has shown excellent performance in > > batch LLM inference scenarios. We urgently need a custom timeout UDF > > for this use case. It would help us handle inference requests that > > time out—especially long-context requests—more precisely, and avoid > > excessive retries that could otherwise block downstream data. > > > > +1 to this proposal. > > > > Best, > > > > Xia > > > > Kui Yuan <[email protected]> 于2026年5月22日周五 11:21写道: > > > > > Hi All, > > > > > > I'd like to open a discussion for FLIP-580: AsyncTableFunction supports > > > user-defined timeout handling logic [1]. > > > > > > An increasing number of users are leveraging AsyncTableFunction to > invoke > > > remote inference clusters. Such invocations are essentially remote > > > inference requests, which are far more prone to timeouts than regular > I/O > > > operations. Users expect to be able to define custom handling logic > when > > a > > > timeout occurs — for example, falling back to default data or > > accumulating > > > failure statistics — rather than having a TimeoutException thrown > > directly > > > and causing the entire job to fail. > > > > > > > > > This FLIP proposal allow users to define custom timeout handling logic > > > inside AsyncTableFunction. > > > > > > I've already discussed the implementation details with @Luogen offline, > > and > > > there's a POC attached [2]. > > > > > > > > > Looking forward to your feedback. > > > > > > Bests, > > > Kui.Yuan > > > > > > [1]: > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-580%3A+AsyncTableFunction+supports+user-defined+timeout+handling+logic > > > > > > [2]: > > > > > > > > > https://github.com/yuchengxin/flink/commit/5a46cd05c48e41a582271dcb9d9842e330871a0b > > > > > >
