Hi everyone,

Hoping to get a few more eyes on the open "AIP-108: Java Task SDK and the
Language Coordinator Layer" and ADRs before we move forward.
Please take a look and share any thoughts when you have a moment.

- AIP:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-108+Java+Task+SDK+and+the+Language+Coordinator+Layer
- ADRs:
https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0
- PR for adding Coordinator: https://github.com/apache/airflow/pull/65958
- PR for adding Java-SDK: https://github.com/apache/airflow/pull/65956

Even a short comment or a +1 on AIP helps us understand where the community
stands.
Thank you!

Best,
Jason

On Tue, Apr 28, 2026 at 10:58 PM Jarek Potiuk <[email protected]> wrote:

> Oh super - nice. I am definitely going to take a look (at both) :D
>
> On Tue, Apr 28, 2026 at 9:55 AM Tzu-ping Chung via dev <
> [email protected]> wrote:
>
>> Hi all,
>>
>> I’ve merged the ADRs and the previous LC message (in the other thread)
>> into a document formatted as an AIP:
>>
>> AIP-108 Java Task SDK and the Language Coordinator Layer - Airflow -
>> Apache Software Foundation <https://cwiki.apache.org/confluence/x/pY4mGQ>
>> cwiki.apache.org <https://cwiki.apache.org/confluence/x/pY4mGQ>
>> [image: favicon.ico] <https://cwiki.apache.org/confluence/x/pY4mGQ>
>> <https://cwiki.apache.org/confluence/x/pY4mGQ>
>>
>>
>> Various sections are taken mostly directly from the other documents, so
>> you can probably glance over them if you’ve already read them. If you
>> haven’t, the AIP document would be a good place to start since it removes
>> some more detailed descriptions and focuses on the high level interfaces.
>> More details are still available in the ADR documents included in the PRs
>> Jason opened.
>>
>> TP
>>
>>
>> On 28 Apr 2026, at 11:12, Zhe-You(Jason) Liu <[email protected]> wrote:
>>
>> Hi Jens,
>>
>> The ADRs are now available here [1] , hope that helps clarify some of your
>> concerns.
>>
>> As I understood Java is static compiled JAR files, no on-the-fly compile
>>
>> from Java source tree (correct?)
>>
>> That's correct -- the user will compile themselves and set the `[java]
>> bundles_folder` config to point to the directory of those JAR files.
>>
>> so actually the Dag parsing concept then is quite "static" and once
>>
>> generated actually no need to re-parse the Dag in Java mode?
>>
>> Still if static JAR deployed how does the deploy lifecacly look like?
>>
>> Would you need to restart with new deploy the Dag Parsr and respective
>> workers?
>>
>> The lifecycle for dag-parsing will be the same as how the current
>> `DagFileProcessorProcess` acts. The coordinator comes into play before we
>> start the actual parse file entrypoint [2]. Regardless of what language
>> the
>> parse file subprocess is implemented in, as long as it returns a valid
>> serialized Dag JSON in msgpack over IPC, the behavior will remain the
>> same.
>> So there is no need to restart the `airflow dag-processor`, and the
>> `airflow worker` will not be involved in dag processing at all.
>>
>> Is it really realistic that a "LocalExecutor" needs to be supported or
>>
>> can we limit it to e-g- only remote executors to reduce coupling and
>> complexity of operating the core?
>>
>> The coordinator is the interface that decides how we want to launch the
>> subprocess for both dag-processing and workload-execution. This means it
>> will support **any** executor out of the box, as we integrate the
>> coordinator at the TaskSDK level for workload-execution [3].
>>
>> What overhead does the "Coordinator layer" generate compared to a Java
>>
>> specific supervisor implementation?
>>
>> The "Coordinator layer" is the interface for Airflow-Core to interact with
>> the target language subprocess. We still need a Java-specific supervisor
>> implementation, which is the first PR I mentioned in another thread --
>> Java
>> SDK [4].
>>
>> Is it not only a new / additional process but also IPC involved then. And
>>
>> at least I saw also performance problems e.g. using very large XComs where
>> even heartbeats are lost due to long running IPC
>>
>> I would consider this out of scope of the "Java SDK and the Coordinator
>> Layer" AIP, as the current Python-native TaskSDK supervisor will encounter
>> the same issue. The multi-language support here follows the same protocol
>> that the current TaskSDK uses.
>>
>> [1]
>>
>> https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0
>> [2]
>>
>> https://github.com/apache/airflow/pull/65958/changes#diff-564fd0a8fbe4cc47864a8043fcc1389b33120c88bb35852b26f45c36b902f70bR540-R573
>> [3]
>>
>> https://github.com/apache/airflow/pull/65958/changes#diff-5bef10ab2956abf7360dbf9b509b6e1113407874d24abcc1b276475051f13abfR1991-R2001
>> [4] https://github.com/apache/airflow/pull/65956
>>
>> Thanks.
>>
>> Best,
>> Jason
>>
>> Best,
>> Jason
>>
>> On Tue, Apr 28, 2026 at 3:03 AM Jens Scheffler <[email protected]>
>> wrote:
>>
>> Thanks TP for raising this!
>>
>> I would need a sleep-over the block of information in the described
>> details and might have some detail questions just to ensure I understood
>> right. So in a ADR or AIP document might be better to comment than in an
>> email thread.
>>
>> Things that jump into my head but there would be more coming thinking
>> about it:
>>
>>  * As I understood Java is static compiled JAR files, no on-the-fly
>>    compile from Java source tree (correct?) - in this case also the
>>    Dags are "static until re-deploy" - so actually the Dag parsing
>>    concept then is quite "static" and once generated actually no need
>>    to re-parse the Dag in Java mode?
>>  * Still if static JAR deployed how does the deploy lifecacly look
>>    like? Would you need to restart with new deploy the Dag Parsr and
>>    respective workers?
>>  * Is it really realistic that a "LocalExecutor" needs to be supported
>>    or can we limit it to e-g- only remote executors to reduce coupling
>>    and complexity of operating the core?
>>  * What overhead does the "Coordinator layer" generate compared to a
>>    Java specific supervisor implementation? Is it not only a new /
>>    additional process but also IPC involved then. And at least I saw
>>    also performance problems e.g. using very large XComs where even
>>    heartbeats are lost due to long running IPC
>>    (https://github.com/apache/airflow/issues/64628)
>>  * (There might be more coming :-) )
>>
>> Jens
>>
>> P.S.: Questions here also do not mean rejection but like to understand
>> which complexity and overhead we have adding all this.
>>
>> On 27.04.26 15:21, Jarek Potiuk wrote:
>>
>> Also I would like to point out one thing.
>>
>> This should not be `LAZY CONSENSUS` just yet, this is quite a big thing
>>
>> to
>>
>> discuss. I missed the subject already had it.
>>
>> At this stage this is really a discussion (I renamed the thread. Because
>> ... we have never discussed it before.
>>
>> LAZY CONSENSUS should be really called for after initial discussion (on
>> devlist) points to us actually reaching the consensus.
>>
>> While this one is unlikely to cause much controversy (I think),
>>
>> sufficient
>>
>> time for people to discuss and digest it before calling for lazy
>>
>> consensus
>>
>> is a necessary prerequisite. We simply need time to build consensus.
>>
>> We have a few processes where we build a "general" consensus first, and
>> then we apply it only to particular cases (such as new providers). But in
>> most cases when we have a "big" thing to discuss, we need to build
>> consensus on devlist first. While in some cases people discussed things
>> off-list and came to some conclusions (whch is perfectly fine) - bringing
>> it to the list as a consensus, where we do not know if we achieved it
>>
>> yet,
>>
>> is - I think - a bit premature.
>>
>> See the lazy consensus explanation [1] and "consenesus building [2]
>>
>> [1] Lazy consensus -
>>
>> https://community.apache.org/committers/decisionMaking.html#lazy-consensus
>>
>> [2]  Consensus building -
>>
>>
>> https://community.apache.org/committers/decisionMaking.html#consensus-building
>>
>>
>> J.
>>
>> On Mon, Apr 27, 2026 at 1:27 PM Aritra Basu<[email protected]>
>> wrote:
>>
>> Hey TP,
>>
>> Overall +1, This is quite an interesting implementation. A couple
>> questions, is provider the right place for the coordinator? Don't have
>> strong opinions or alternatives, but I am curious.
>>
>> Also for the parser wanted to understand a bit better how it works? I
>>
>> tried
>>
>> going through the SDK but wasn't able to fully understand it. Also +1 to
>> Jarek's recommendation for documentation.
>>
>>
>>
>> --
>> Regards,
>> Aritra Basu
>>
>> On Mon, 27 Apr 2026, 11:39 am Tzu-ping Chung via dev, <
>> [email protected]> wrote:
>>
>> Hi all,
>>
>> As mentioned in the latest dev call, we have been developing a Java SDK
>> with changes to Airflow in a separate fork[1]. We plan to start merging
>>
>> the
>>
>> Java SDK work back into the OSS repository.
>>
>> We see this as a natural step following initial work in AIP-72[2],
>>
>> which
>>
>> created “a clean language agnostic interface for task execution, with
>> support for multiple language bindings” (quoted from the proposal).
>>
>> The Java SDK also uses Ash’s addition of @task.stub[3] for the Go SDK,
>>
>> to
>>
>> declare a task in a DAG to be “implemented elsewhere” (not in the
>>
>> annotated
>>
>> function). Similar to the Go SDK, we also created a Java library that
>>
>> users
>>
>> can use to write task implementations for Airflow to execute at
>>
>> runtime.
>>
>>
>> [1]:https://github.com/astronomer/airflow/tree/feature/java-all
>> [2]:https://cwiki.apache.org/confluence/x/xgmTEg
>> [3]:https://github.com/apache/airflow/pull/56055
>>
>> The user-facing syntax for a stub task would be the same as implemented
>>
>> by
>>
>> the Go SDK:
>>
>>     @task.stub(queue="java-tasks")
>>     def my_task(): ...
>>
>> With a new configuration option to map tasks in a pool to be executed
>>
>> by
>>
>> a
>>
>> specific SDK:
>>
>>     [sdk]
>>     queue_to_sdk = {"java-tasks": "java"}
>>
>> The configuration is needed for some executors the Go SDK currently
>>
>> does
>>
>> not support. The Go SDK currently relies on each executor worker
>>
>> process
>>
>> to
>>
>> specify which queues they listen to, but this is not always viable,
>>
>> since
>>
>> some executors—LocalExecutor, for example—do not have the concept of
>>
>> worker
>>
>> processes.
>>
>> The Coordinator Layer
>> =====================
>>
>> When the Go SDK was implemented, it left out Runtime Airflow plugins
>>
>> as a
>>
>> future topic. This includes custom XCom backends, secrets backends
>>
>> lookup
>>
>> for connections and variables, etc. These components are implemented in
>> Python, and a Java task cannot easily use the feature unless we also
>> implement the lookup logic in Java. We don’t want to do that since it
>> introduces significant overhead to writing plugins, and the overhead
>> multiplies with each new language SDK.
>>
>> Fortunately, the current execution-time task runner already uses a
>> two-layer design. When an executor wants to run a task, it starts a
>> (Python) task runner process that talks to Airflow Core through the
>> Execution API, and *forks* another (Python) process, which talks to the
>> task runner through TCP, to run the actual task code. Airflow plugins
>> simply go into the task runner process.
>>
>> This design works well for us since it keeps all the Airflow plugins in
>> Python. The only thing missing is an abstraction for the task runner
>> process to run tasks in any language. We are calling this new layer the
>> **Coordinator**.
>>
>> When a DAG bundle is loaded, it not only tells Airflow how to find the
>> DAGs (and the tasks in them), but also how to *run* each task. Current
>> Python tasks use the Python Coordinator, running tasks by forking as
>> previously described. A new JVM Coordinator will instruct the task
>>
>> runner
>>
>> how to run tasks packaged in JAR files.
>>
>> Each coordinator implements a base interface (BaseRuntimeCoordinator)
>>
>> that
>>
>> handles three concerns:
>>
>> - Discovery: determining whether a given file belongs to this
>>
>> coordinator
>>
>> (e.g. JAR files for Java).
>> - DAG parsing: returning a runtime-specific subprocess command to parse
>> DAG files in the target language.
>> - Task execution: returning a runtime-specific subprocess command to
>> execute tasks in the target runtime.
>>
>> The base class owns the full bridge lifecycle—TCP servers, subprocess
>> management, and cleanup—so language providers only need to implement
>>
>> these
>>
>> three methods.
>>
>> The coordinator translates a DagFileParseRequest (for DAG parsing) and
>> StartupDetails (for Task execution) data model (as declared in Airflow)
>> into the appropriate commands for the target runtime. For example, a
>>
>> “java
>>
>> -classpath ... /path/to/MainClass ...” subprocess command that points
>>
>> to
>>
>> the correct JAR file and main class in this case.
>>
>> Coordinators as Airflow Providers
>> =================================
>>
>> The base coordinator interface and the Python coordinator will live in
>> “airflow.sdk.execution_time”. Other coordinators (for foreign
>>
>> languages)
>>
>> are registered through the existing Airflow provider mechanism. Each
>>
>> SDK
>>
>> provider declares its coordinator in its provider.yaml under a
>> “coordinators” extension point. Both ProvidersManager (airflow-core)
>>
>> and
>>
>> ProvidersManagerTaskRuntime (task-sdk) discover coordinators through
>>
>> this
>>
>> extension point. This means adding a new language runtime requires
>>
>> only a
>>
>> provider package. No changes to Airflow Core are needed.
>>
>> The new JVM-based coordinator will live in the namespace
>> “airflow.providers.sdk.java”. This is not the most accurate name
>> (technically it should be “jvm” instead), but in practice most users
>>
>> will
>>
>> recognize it, and (from my understanding) other JVM language users
>>
>> (e.g.
>>
>> Kotlin, Scala) are already well-versed enough dealing with Java
>> interoperability to understand “java” means JVM in this context.
>>
>> Writing DAGs in Java
>> ====================
>>
>> This is not strictly connected to AIP-72, but considered by us as a
>> natural next step since we can now implement tasks in a foreign
>>
>> language.
>>
>> Being able to define the DAG in the same language as the task
>> implementation is useful since writing Python, even if only with
>>
>> minimal
>>
>> syntax, is still a hurdle for those not already familiar with, or even
>> allowed to run it. There are mainly three things we need on top of the
>>
>> task
>>
>> implementation interface:
>>
>> - DAG flags (e.g. schedule, max_active_tasks)
>> - Task flags (e.g. trigger_rule, weight_rule)
>> - Task dependencies
>>
>> A proof-of-concept implementation is included with other changes
>>
>> proposed
>>
>> elsewhere in this document.
>>
>> Lazy Consensus Topics
>> =====================
>>
>> We’re calling for lazy consensus for the following topics
>>
>> - A new “queue_to_sdk” configuration option to route tasks to a
>>
>> specific
>>
>> language SDK
>> - A new coordinator layer in the SDK to route implementations at
>>
>> execution
>>
>> time.
>> - New providers under airflow.providers.sdk to provide additional
>>
>> language
>>
>> support.
>> - Develop the Go SDK to support the proposed model and a provider
>>
>> package
>>
>> for the coordinator. (Existing features stay as-is; no breaking
>>
>> changes.)
>>
>> - Add the new Java SDK and the corresponding provider package.
>>
>> TP
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:[email protected]
>> For additional commands, e-mail:[email protected]
>>
>>
>>
>>

Reply via email to