Hi Lincoln,
Thanks a lot for the great questions! I'll incorporate the relevant
clarifications into the FLIP. Let me share some quick thoughts here:
Q1 (interaction with retry): The timeout reuses the existing flow in
AsyncWaitOperator. The configured timeout is the total budget measured
from the first asyncInvoke. When the async-lookup timer expires, any
pending delayed retry is cancelled and timeout(...) is invoked exactly
once. The retry result/exception predicates are not consulted on the
timeout path.
Q2 ("CompletableFuture must be completed synchronously"): This is a
runtime guarantee, not just a javadoc convention. The codegen'd code
checks whether the future is completed; if it isn't, an error is
raised and the job fails fast.
Q3 (failure modes of the user timeout method):
If the timeout method itself throws, we surface the exception and fail
the job fast — no silent swallowing.
If timeout blocks synchronously and never returns, it would block the
mailbox thread and effectively freeze the task. This is admittedly a
serious situation, and we don't have a mechanism for it today. Adding
one would require a separate timeout-on-timeout, which could break the
single-threaded mailbox assumption. So for now the simpler approach is
to document the contract in the javadoc — timeout must be
non-blocking. Open to suggestions if you have a better idea here.
Q4 (var-args matching): For var-args, the signatures must mirror each
other. If eval is eval(CompletableFuture<T>, String...), then timeout
must be timeout(CompletableFuture<T>, String...). Any mismatch is
rejected at startup with a clear error and I'll add unit tests to
cover this.
Q5 (AsyncScalarFunction): This was an intentional scoping choice. The
use cases driving the custom-timeout requirement today are mostly LLM
calls, where the output is rarely a single scalar — so users tend to
reach for AsyncTableFunction instead of AsyncScalarFunction (the
built-in AsyncPredictFunction also extends AsyncTableFunction). Until
we hear a concrete user need on the scalar side, supporting timeout
only on AsyncTableFunction feels sufficient. Happy to revisit if/when
such a need surfaces.
Thanks again for the careful review!
Best,
Kui.Yuan
Lincoln Lee <[email protected]> 于2026年5月26日周二 15:16写道:
>
> Thanks for the FLIP, nice addition! A few questions before the vote:
>
> 1. Interaction with the retry strategy: the FLIP doesn't say where the new
> timeout hook sits in that flow, worth spelling out explicitly.
>
> 2. "CompletableFuture must be completed synchronously." Is this a javadoc
> convention or a runtime guarantee? If it's only a convention, it would help
> to
> document what happens when users break it. If it's enforced, a brief note
> would
> be useful.
>
> 3. Failure modes of the user timeout method: two cases aren't covered:
> The method itself throws, do we swallow and degrade the Exception, or fail
> the job?
> The method never completes (bug or hang), does the operator stall
> indefinitely?
>
> 4. On the public interface: since timeout's parameter list mirrors eval,
> how is
> a var-args eval (e.g. eval(CompletableFuture<T>, String...)) expected to be
> matched?
>
> 5. How has AsyncScalarFunction been considered here? It shares the same
> timeout-prone remote-call pattern, so it seems natural to extend the same
> mechanism there as well, is that in scope, a follow-up, or intentionally
> left out?
>
>
> Best,
> Lincoln Lee
>
>
> Gen Luo <[email protected]> 于2026年5月26日周二 11:02写道:
>
> > Thanks for driving this proposal forward! This addresses a real pain point
> > we've been hearing about for a while.
> >
> > Many of our users rely on AsyncTableFunction or Lookup Join to implement
> > custom external service calls and data fetching, typically for RAG or LLM
> > inference scenarios. Due to the inherent instability of these external
> > services, timeouts occur occasionally, and users want to apply fallback
> > strategies (e.g., falling back to a local rule-based model) rather than
> > failing the entire job. However, this hasn't been achievable so far — the
> > hard-coded TimeoutException behavior introduces stability risks, forcing
> > users to keep increasing the timeout value to absurd levels and work around
> > the issue in various hacky ways. Worse, each user tends to hit this pitfall
> > independently before realizing the limitation.
> >
> > Adding a timeout interface not only addresses this pain point, but also
> > aligns the API contract between AsyncTableFunction and AsyncFunction,
> > avoiding unnecessary confusion for users.
> >
> > Big +1 from our side — looking forward to seeing this land.
> >
> > On Mon, May 25, 2026 at 7:53 PM Xia Sun <[email protected]> wrote:
> >
> > > Hi Kui.Yuan,
> > >
> > > Thanks for driving this!
> > >
> > > In our production practice, the asynchronous I/O capability of
> > > AsyncTableFunction has shown excellent performance in
> > > batch LLM inference scenarios. We urgently need a custom timeout UDF
> > > for this use case. It would help us handle inference requests that
> > > time out—especially long-context requests—more precisely, and avoid
> > > excessive retries that could otherwise block downstream data.
> > >
> > > +1 to this proposal.
> > >
> > > Best,
> > >
> > > Xia
> > >
> > > Kui Yuan <[email protected]> 于2026年5月22日周五 11:21写道:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to open a discussion for FLIP-580: AsyncTableFunction supports
> > > > user-defined timeout handling logic [1].
> > > >
> > > > An increasing number of users are leveraging AsyncTableFunction to
> > invoke
> > > > remote inference clusters. Such invocations are essentially remote
> > > > inference requests, which are far more prone to timeouts than regular
> > I/O
> > > > operations. Users expect to be able to define custom handling logic
> > when
> > > a
> > > > timeout occurs — for example, falling back to default data or
> > > accumulating
> > > > failure statistics — rather than having a TimeoutException thrown
> > > directly
> > > > and causing the entire job to fail.
> > > >
> > > >
> > > > This FLIP proposal allow users to define custom timeout handling logic
> > > > inside AsyncTableFunction.
> > > >
> > > > I've already discussed the implementation details with @Luogen offline,
> > > and
> > > > there's a POC attached [2].
> > > >
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Bests,
> > > > Kui.Yuan
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-580%3A+AsyncTableFunction+supports+user-defined+timeout+handling+logic
> > > >
> > > > [2]:
> > > >
> > > >
> > >
> > https://github.com/yuchengxin/flink/commit/5a46cd05c48e41a582271dcb9d9842e330871a0b
> > > >
> > >
> >