Re: [DISCUSS] Move connection testing to workers

Anish Giri Mon, 23 Feb 2026 09:22:37 -0800

Hi Vikram,

Thank you very much for your query, yes, that's the tradeoff.
As Jens noted, there could be differences with multi-queue
setups, but starting with the default queue felt
like a reasonable first step since it already
improves on the current behavior (testing on the API
server, which typically can't reach any external systems).


An optional queue parameter would be a natural
follow up if there's demand. Would love to hear if
you think that would cover your concern.

Thanks,
Anish

On Mon, Feb 23, 2026 at 10:32 AM Vikram Koka via dev
<[email protected]> wrote:
>
> So, the tradeoff here is that people using different connections for
> different worker queues lose this capability?
>
> Did I understand that correctly?
>
> Vikram
>
> On Sun, Feb 22, 2026 at 2:27 PM Anish Giri <[email protected]> wrote:
>
> > Thanks Jarek, thanks Jens, this is really helpful.
> >
> > Understood on passing just the connection_id and having the worker
> > fetch credentials through the standard path. That simplifies things a
> > lot. So the flow would be: connection is already saved, test gets
> > queued with the connection_id, worker picks it up and retrieves
> > everything the usual way.
> >
> > The crypto-random ID for polling makes a lot of sense, which covers
> > the authorization side and keeps cleanup simple with db clean and
> > timestamps.
> >
> > On queue routing,  agreed, default queue for now as Jens mentioned.
> >
> > I'll start on the endpoints and worker function and put up an
> > implementation PR. The dispatch side depends on #61153,  happy to look
> > at decoupling the dag_run_id requirement there too. Will tag you both
> > when it's up.
> >
> > Anish
> >
> > On Sun, Feb 22, 2026 at 8:57 AM Jens Scheffler <[email protected]>
> > wrote:
> > >
> > > +1 in my view. That would be a proper resolution - with the trade-off
> > > that processing is async and response to user might take a few seconds
> > > for "some" worker to pick-up.
> > >
> > > We would need to assume that testing is using "default" queue, if
> > > different workers are configured then there might be differences - but
> > > adding further complexity would not be reasonable in my view.
> > >
> > > On 22.02.26 15:38, Jarek Potiuk wrote:
> > > > I am all for it :)
> > > >
> > > >> 1. The connection test needs to store an encrypted URI, conn_type, and
> > > > some timestamps. Is the Callback.data JSON column the right place
> > > > for that, or does it warrant its own small table?
> > > >
> > > > They don't have to be stored. It's enough to send connection_id (after
> > > > saving it to the DB).  The worker can retrieve all the credentials the
> > > > usual way workers do.
> > > > I think it's reasonable to only run test connection when it has been
> > > > saved (not during editing) - and even if during editing, we could save
> > > > it automatically for tests.
> > > >
> > > >
> > > >> 2. Stale requests: if a worker crashes mid-test, the record stays
> > > > in a non-terminal state. Should there be a scheduler-side reaper
> > > > similar to zombie task detection, or is client-side timeout (60s
> > > > in the UI) enough?
> > > >
> > > > A good idea would be to generate a random/unique ID for the test
> > > > request. This ID should be random enough to prevent easy guessing,
> > > > ensuring only the client who initiated the request can poll for its
> > > > status—which also serves as a security feature. We can simply store
> > > > such test connection requests (and eventually responses) in a
> > > > database, including a timestamp, and use our standard `db clean` to
> > > > clear them.
> > > >
> > > > J.
> > > >
> > > >
> > > > On Sun, Feb 22, 2026 at 4:52 AM Anish Giri <[email protected]>
> > wrote:
> > > >> Hi all,
> > > >>
> > > >> I'd like to discuss moving connection testing off the API server and
> > > >> onto workers. Jarek suggested this direction in a comment on #59643
> > > >> [1], and I think the Callback infrastructure being built for running
> > > >> callbacks on executors is the right foundation for it.
> > > >>
> > > >> Since 2.7.0, test_connection has been disabled by default (#32052).
> > > >> Running it on the API server has two problems: the API server
> > > >> shouldn't be executing user-supplied driver code (Jarek described the
> > > >> ODBC/JDBC risks in detail on #59643), and workers typically have
> > > >> network access to external systems that API servers don't, so test
> > > >> results from the API server can be misleading.
> > > >>
> > > >> Ramit's generic Callback model (#54796 [2]) and Ferruzzi's
> > > >> in-progress executor dispatch (#61153 [3]) together give us most of
> > > >> what's needed. The flow would be:
> > > >>
> > > >> 1. UI calls POST /connections/test
> > > >> 2. API server Fernet-encrypts the connection URI, creates an
> > > >> ExecutorCallback pointing to the test function, returns an ID
> > > >> 3. Scheduler dispatch loop (from #61153) picks it up, sends it
> > > >> to the executor
> > > >> 4. Worker decrypts the URI, builds a transient Connection, calls
> > > >> test_connection(), reports result through the callback path
> > > >> 5. UI polls GET /connections/test/{id} until it gets a terminal
> > > >> state
> > > >>
> > > >> The connection-testing-specific code would be small: a POST endpoint
> > > >> to queue the test, a GET endpoint to poll for results, and the worker
> > > >> function that decrypts and runs test_connection().
> > > >>
> > > >> One thing I noticed: #61153's _enqueue_executor_callbacks currently
> > > >> requires dag_run_id in the callback data dict, and
> > ExecuteCallback.make
> > > >> needs a DagRun for bundle info. Connection tests don't have a DagRun.
> > > >> It would be a small change to make that optional. The dispatch query
> > > >> itself is already generic (selects all PENDING ExecutorCallbacks). I
> > > >> can take a look at decoupling that if it would be useful.
> > > >>
> > > >> A couple of other open questions:
> > > >>
> > > >> 1. The connection test needs to store an encrypted URI, conn_type, and
> > > >> some timestamps. Is the Callback.data JSON column the right place
> > > >> for that, or does it warrant its own small table?
> > > >>
> > > >> 2. Stale requests: if a worker crashes mid-test, the record stays
> > > >> in a non-terminal state. Should there be a scheduler-side reaper
> > > >> similar to zombie task detection, or is client-side timeout (60s
> > > >> in the UI) enough?
> > > >>
> > > >> I explored this earlier in #60618 [4] with a self-contained
> > > >> implementation. Now that the ExecutorCallback dispatch is taking shape
> > > >> in #61153, building on top of will be in right direction.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Anish
> > > >>
> > > >> [1] https://github.com/apache/airflow/pull/59643
> > > >> [2] https://github.com/apache/airflow/pull/54796
> > > >> [3] https://github.com/apache/airflow/pull/61153
> > > >> [4] https://github.com/apache/airflow/pull/60618
> > > >>
> > > >> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: [email protected]
> > > >> For additional commands, e-mail: [email protected]
> > > >>
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Move connection testing to workers

Reply via email to