Hi Vikram, Thank you very much for your query, yes, that's the tradeoff. As Jens noted, there could be differences with multi-queue setups, but starting with the default queue felt like a reasonable first step since it already improves on the current behavior (testing on the API server, which typically can't reach any external systems).
An optional queue parameter would be a natural follow up if there's demand. Would love to hear if you think that would cover your concern. Thanks, Anish On Mon, Feb 23, 2026 at 10:32 AM Vikram Koka via dev <[email protected]> wrote: > > So, the tradeoff here is that people using different connections for > different worker queues lose this capability? > > Did I understand that correctly? > > Vikram > > On Sun, Feb 22, 2026 at 2:27 PM Anish Giri <[email protected]> wrote: > > > Thanks Jarek, thanks Jens, this is really helpful. > > > > Understood on passing just the connection_id and having the worker > > fetch credentials through the standard path. That simplifies things a > > lot. So the flow would be: connection is already saved, test gets > > queued with the connection_id, worker picks it up and retrieves > > everything the usual way. > > > > The crypto-random ID for polling makes a lot of sense, which covers > > the authorization side and keeps cleanup simple with db clean and > > timestamps. > > > > On queue routing, agreed, default queue for now as Jens mentioned. > > > > I'll start on the endpoints and worker function and put up an > > implementation PR. The dispatch side depends on #61153, happy to look > > at decoupling the dag_run_id requirement there too. Will tag you both > > when it's up. > > > > Anish > > > > On Sun, Feb 22, 2026 at 8:57 AM Jens Scheffler <[email protected]> > > wrote: > > > > > > +1 in my view. That would be a proper resolution - with the trade-off > > > that processing is async and response to user might take a few seconds > > > for "some" worker to pick-up. > > > > > > We would need to assume that testing is using "default" queue, if > > > different workers are configured then there might be differences - but > > > adding further complexity would not be reasonable in my view. > > > > > > On 22.02.26 15:38, Jarek Potiuk wrote: > > > > I am all for it :) > > > > > > > >> 1. The connection test needs to store an encrypted URI, conn_type, and > > > > some timestamps. Is the Callback.data JSON column the right place > > > > for that, or does it warrant its own small table? > > > > > > > > They don't have to be stored. It's enough to send connection_id (after > > > > saving it to the DB). The worker can retrieve all the credentials the > > > > usual way workers do. > > > > I think it's reasonable to only run test connection when it has been > > > > saved (not during editing) - and even if during editing, we could save > > > > it automatically for tests. > > > > > > > > > > > >> 2. Stale requests: if a worker crashes mid-test, the record stays > > > > in a non-terminal state. Should there be a scheduler-side reaper > > > > similar to zombie task detection, or is client-side timeout (60s > > > > in the UI) enough? > > > > > > > > A good idea would be to generate a random/unique ID for the test > > > > request. This ID should be random enough to prevent easy guessing, > > > > ensuring only the client who initiated the request can poll for its > > > > status—which also serves as a security feature. We can simply store > > > > such test connection requests (and eventually responses) in a > > > > database, including a timestamp, and use our standard `db clean` to > > > > clear them. > > > > > > > > J. > > > > > > > > > > > > On Sun, Feb 22, 2026 at 4:52 AM Anish Giri <[email protected]> > > wrote: > > > >> Hi all, > > > >> > > > >> I'd like to discuss moving connection testing off the API server and > > > >> onto workers. Jarek suggested this direction in a comment on #59643 > > > >> [1], and I think the Callback infrastructure being built for running > > > >> callbacks on executors is the right foundation for it. > > > >> > > > >> Since 2.7.0, test_connection has been disabled by default (#32052). > > > >> Running it on the API server has two problems: the API server > > > >> shouldn't be executing user-supplied driver code (Jarek described the > > > >> ODBC/JDBC risks in detail on #59643), and workers typically have > > > >> network access to external systems that API servers don't, so test > > > >> results from the API server can be misleading. > > > >> > > > >> Ramit's generic Callback model (#54796 [2]) and Ferruzzi's > > > >> in-progress executor dispatch (#61153 [3]) together give us most of > > > >> what's needed. The flow would be: > > > >> > > > >> 1. UI calls POST /connections/test > > > >> 2. API server Fernet-encrypts the connection URI, creates an > > > >> ExecutorCallback pointing to the test function, returns an ID > > > >> 3. Scheduler dispatch loop (from #61153) picks it up, sends it > > > >> to the executor > > > >> 4. Worker decrypts the URI, builds a transient Connection, calls > > > >> test_connection(), reports result through the callback path > > > >> 5. UI polls GET /connections/test/{id} until it gets a terminal > > > >> state > > > >> > > > >> The connection-testing-specific code would be small: a POST endpoint > > > >> to queue the test, a GET endpoint to poll for results, and the worker > > > >> function that decrypts and runs test_connection(). > > > >> > > > >> One thing I noticed: #61153's _enqueue_executor_callbacks currently > > > >> requires dag_run_id in the callback data dict, and > > ExecuteCallback.make > > > >> needs a DagRun for bundle info. Connection tests don't have a DagRun. > > > >> It would be a small change to make that optional. The dispatch query > > > >> itself is already generic (selects all PENDING ExecutorCallbacks). I > > > >> can take a look at decoupling that if it would be useful. > > > >> > > > >> A couple of other open questions: > > > >> > > > >> 1. The connection test needs to store an encrypted URI, conn_type, and > > > >> some timestamps. Is the Callback.data JSON column the right place > > > >> for that, or does it warrant its own small table? > > > >> > > > >> 2. Stale requests: if a worker crashes mid-test, the record stays > > > >> in a non-terminal state. Should there be a scheduler-side reaper > > > >> similar to zombie task detection, or is client-side timeout (60s > > > >> in the UI) enough? > > > >> > > > >> I explored this earlier in #60618 [4] with a self-contained > > > >> implementation. Now that the ExecutorCallback dispatch is taking shape > > > >> in #61153, building on top of will be in right direction. > > > >> > > > >> Thoughts? > > > >> > > > >> Anish > > > >> > > > >> [1] https://github.com/apache/airflow/pull/59643 > > > >> [2] https://github.com/apache/airflow/pull/54796 > > > >> [3] https://github.com/apache/airflow/pull/61153 > > > >> [4] https://github.com/apache/airflow/pull/60618 > > > >> > > > >> --------------------------------------------------------------------- > > > >> To unsubscribe, e-mail: [email protected] > > > >> For additional commands, e-mail: [email protected] > > > >> > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
