Hi everyone, Hoping to get a few more eyes on the open "AIP-108: Java Task SDK and the Language Coordinator Layer" and ADRs before we move forward. Please take a look and share any thoughts when you have a moment.
- AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-108+Java+Task+SDK+and+the+Language+Coordinator+Layer - ADRs: https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0 - PR for adding Coordinator: https://github.com/apache/airflow/pull/65958 - PR for adding Java-SDK: https://github.com/apache/airflow/pull/65956 Even a short comment or a +1 on AIP helps us understand where the community stands. Thank you! Best, Jason On Tue, Apr 28, 2026 at 10:58 PM Jarek Potiuk <[email protected]> wrote: > Oh super - nice. I am definitely going to take a look (at both) :D > > On Tue, Apr 28, 2026 at 9:55 AM Tzu-ping Chung via dev < > [email protected]> wrote: > >> Hi all, >> >> I’ve merged the ADRs and the previous LC message (in the other thread) >> into a document formatted as an AIP: >> >> AIP-108 Java Task SDK and the Language Coordinator Layer - Airflow - >> Apache Software Foundation <https://cwiki.apache.org/confluence/x/pY4mGQ> >> cwiki.apache.org <https://cwiki.apache.org/confluence/x/pY4mGQ> >> [image: favicon.ico] <https://cwiki.apache.org/confluence/x/pY4mGQ> >> <https://cwiki.apache.org/confluence/x/pY4mGQ> >> >> >> Various sections are taken mostly directly from the other documents, so >> you can probably glance over them if you’ve already read them. If you >> haven’t, the AIP document would be a good place to start since it removes >> some more detailed descriptions and focuses on the high level interfaces. >> More details are still available in the ADR documents included in the PRs >> Jason opened. >> >> TP >> >> >> On 28 Apr 2026, at 11:12, Zhe-You(Jason) Liu <[email protected]> wrote: >> >> Hi Jens, >> >> The ADRs are now available here [1] , hope that helps clarify some of your >> concerns. >> >> As I understood Java is static compiled JAR files, no on-the-fly compile >> >> from Java source tree (correct?) >> >> That's correct -- the user will compile themselves and set the `[java] >> bundles_folder` config to point to the directory of those JAR files. >> >> so actually the Dag parsing concept then is quite "static" and once >> >> generated actually no need to re-parse the Dag in Java mode? >> >> Still if static JAR deployed how does the deploy lifecacly look like? >> >> Would you need to restart with new deploy the Dag Parsr and respective >> workers? >> >> The lifecycle for dag-parsing will be the same as how the current >> `DagFileProcessorProcess` acts. The coordinator comes into play before we >> start the actual parse file entrypoint [2]. Regardless of what language >> the >> parse file subprocess is implemented in, as long as it returns a valid >> serialized Dag JSON in msgpack over IPC, the behavior will remain the >> same. >> So there is no need to restart the `airflow dag-processor`, and the >> `airflow worker` will not be involved in dag processing at all. >> >> Is it really realistic that a "LocalExecutor" needs to be supported or >> >> can we limit it to e-g- only remote executors to reduce coupling and >> complexity of operating the core? >> >> The coordinator is the interface that decides how we want to launch the >> subprocess for both dag-processing and workload-execution. This means it >> will support **any** executor out of the box, as we integrate the >> coordinator at the TaskSDK level for workload-execution [3]. >> >> What overhead does the "Coordinator layer" generate compared to a Java >> >> specific supervisor implementation? >> >> The "Coordinator layer" is the interface for Airflow-Core to interact with >> the target language subprocess. We still need a Java-specific supervisor >> implementation, which is the first PR I mentioned in another thread -- >> Java >> SDK [4]. >> >> Is it not only a new / additional process but also IPC involved then. And >> >> at least I saw also performance problems e.g. using very large XComs where >> even heartbeats are lost due to long running IPC >> >> I would consider this out of scope of the "Java SDK and the Coordinator >> Layer" AIP, as the current Python-native TaskSDK supervisor will encounter >> the same issue. The multi-language support here follows the same protocol >> that the current TaskSDK uses. >> >> [1] >> >> https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0 >> [2] >> >> https://github.com/apache/airflow/pull/65958/changes#diff-564fd0a8fbe4cc47864a8043fcc1389b33120c88bb35852b26f45c36b902f70bR540-R573 >> [3] >> >> https://github.com/apache/airflow/pull/65958/changes#diff-5bef10ab2956abf7360dbf9b509b6e1113407874d24abcc1b276475051f13abfR1991-R2001 >> [4] https://github.com/apache/airflow/pull/65956 >> >> Thanks. >> >> Best, >> Jason >> >> Best, >> Jason >> >> On Tue, Apr 28, 2026 at 3:03 AM Jens Scheffler <[email protected]> >> wrote: >> >> Thanks TP for raising this! >> >> I would need a sleep-over the block of information in the described >> details and might have some detail questions just to ensure I understood >> right. So in a ADR or AIP document might be better to comment than in an >> email thread. >> >> Things that jump into my head but there would be more coming thinking >> about it: >> >> * As I understood Java is static compiled JAR files, no on-the-fly >> compile from Java source tree (correct?) - in this case also the >> Dags are "static until re-deploy" - so actually the Dag parsing >> concept then is quite "static" and once generated actually no need >> to re-parse the Dag in Java mode? >> * Still if static JAR deployed how does the deploy lifecacly look >> like? Would you need to restart with new deploy the Dag Parsr and >> respective workers? >> * Is it really realistic that a "LocalExecutor" needs to be supported >> or can we limit it to e-g- only remote executors to reduce coupling >> and complexity of operating the core? >> * What overhead does the "Coordinator layer" generate compared to a >> Java specific supervisor implementation? Is it not only a new / >> additional process but also IPC involved then. And at least I saw >> also performance problems e.g. using very large XComs where even >> heartbeats are lost due to long running IPC >> (https://github.com/apache/airflow/issues/64628) >> * (There might be more coming :-) ) >> >> Jens >> >> P.S.: Questions here also do not mean rejection but like to understand >> which complexity and overhead we have adding all this. >> >> On 27.04.26 15:21, Jarek Potiuk wrote: >> >> Also I would like to point out one thing. >> >> This should not be `LAZY CONSENSUS` just yet, this is quite a big thing >> >> to >> >> discuss. I missed the subject already had it. >> >> At this stage this is really a discussion (I renamed the thread. Because >> ... we have never discussed it before. >> >> LAZY CONSENSUS should be really called for after initial discussion (on >> devlist) points to us actually reaching the consensus. >> >> While this one is unlikely to cause much controversy (I think), >> >> sufficient >> >> time for people to discuss and digest it before calling for lazy >> >> consensus >> >> is a necessary prerequisite. We simply need time to build consensus. >> >> We have a few processes where we build a "general" consensus first, and >> then we apply it only to particular cases (such as new providers). But in >> most cases when we have a "big" thing to discuss, we need to build >> consensus on devlist first. While in some cases people discussed things >> off-list and came to some conclusions (whch is perfectly fine) - bringing >> it to the list as a consensus, where we do not know if we achieved it >> >> yet, >> >> is - I think - a bit premature. >> >> See the lazy consensus explanation [1] and "consenesus building [2] >> >> [1] Lazy consensus - >> >> https://community.apache.org/committers/decisionMaking.html#lazy-consensus >> >> [2] Consensus building - >> >> >> https://community.apache.org/committers/decisionMaking.html#consensus-building >> >> >> J. >> >> On Mon, Apr 27, 2026 at 1:27 PM Aritra Basu<[email protected]> >> wrote: >> >> Hey TP, >> >> Overall +1, This is quite an interesting implementation. A couple >> questions, is provider the right place for the coordinator? Don't have >> strong opinions or alternatives, but I am curious. >> >> Also for the parser wanted to understand a bit better how it works? I >> >> tried >> >> going through the SDK but wasn't able to fully understand it. Also +1 to >> Jarek's recommendation for documentation. >> >> >> >> -- >> Regards, >> Aritra Basu >> >> On Mon, 27 Apr 2026, 11:39 am Tzu-ping Chung via dev, < >> [email protected]> wrote: >> >> Hi all, >> >> As mentioned in the latest dev call, we have been developing a Java SDK >> with changes to Airflow in a separate fork[1]. We plan to start merging >> >> the >> >> Java SDK work back into the OSS repository. >> >> We see this as a natural step following initial work in AIP-72[2], >> >> which >> >> created “a clean language agnostic interface for task execution, with >> support for multiple language bindings” (quoted from the proposal). >> >> The Java SDK also uses Ash’s addition of @task.stub[3] for the Go SDK, >> >> to >> >> declare a task in a DAG to be “implemented elsewhere” (not in the >> >> annotated >> >> function). Similar to the Go SDK, we also created a Java library that >> >> users >> >> can use to write task implementations for Airflow to execute at >> >> runtime. >> >> >> [1]:https://github.com/astronomer/airflow/tree/feature/java-all >> [2]:https://cwiki.apache.org/confluence/x/xgmTEg >> [3]:https://github.com/apache/airflow/pull/56055 >> >> The user-facing syntax for a stub task would be the same as implemented >> >> by >> >> the Go SDK: >> >> @task.stub(queue="java-tasks") >> def my_task(): ... >> >> With a new configuration option to map tasks in a pool to be executed >> >> by >> >> a >> >> specific SDK: >> >> [sdk] >> queue_to_sdk = {"java-tasks": "java"} >> >> The configuration is needed for some executors the Go SDK currently >> >> does >> >> not support. The Go SDK currently relies on each executor worker >> >> process >> >> to >> >> specify which queues they listen to, but this is not always viable, >> >> since >> >> some executors—LocalExecutor, for example—do not have the concept of >> >> worker >> >> processes. >> >> The Coordinator Layer >> ===================== >> >> When the Go SDK was implemented, it left out Runtime Airflow plugins >> >> as a >> >> future topic. This includes custom XCom backends, secrets backends >> >> lookup >> >> for connections and variables, etc. These components are implemented in >> Python, and a Java task cannot easily use the feature unless we also >> implement the lookup logic in Java. We don’t want to do that since it >> introduces significant overhead to writing plugins, and the overhead >> multiplies with each new language SDK. >> >> Fortunately, the current execution-time task runner already uses a >> two-layer design. When an executor wants to run a task, it starts a >> (Python) task runner process that talks to Airflow Core through the >> Execution API, and *forks* another (Python) process, which talks to the >> task runner through TCP, to run the actual task code. Airflow plugins >> simply go into the task runner process. >> >> This design works well for us since it keeps all the Airflow plugins in >> Python. The only thing missing is an abstraction for the task runner >> process to run tasks in any language. We are calling this new layer the >> **Coordinator**. >> >> When a DAG bundle is loaded, it not only tells Airflow how to find the >> DAGs (and the tasks in them), but also how to *run* each task. Current >> Python tasks use the Python Coordinator, running tasks by forking as >> previously described. A new JVM Coordinator will instruct the task >> >> runner >> >> how to run tasks packaged in JAR files. >> >> Each coordinator implements a base interface (BaseRuntimeCoordinator) >> >> that >> >> handles three concerns: >> >> - Discovery: determining whether a given file belongs to this >> >> coordinator >> >> (e.g. JAR files for Java). >> - DAG parsing: returning a runtime-specific subprocess command to parse >> DAG files in the target language. >> - Task execution: returning a runtime-specific subprocess command to >> execute tasks in the target runtime. >> >> The base class owns the full bridge lifecycle—TCP servers, subprocess >> management, and cleanup—so language providers only need to implement >> >> these >> >> three methods. >> >> The coordinator translates a DagFileParseRequest (for DAG parsing) and >> StartupDetails (for Task execution) data model (as declared in Airflow) >> into the appropriate commands for the target runtime. For example, a >> >> “java >> >> -classpath ... /path/to/MainClass ...” subprocess command that points >> >> to >> >> the correct JAR file and main class in this case. >> >> Coordinators as Airflow Providers >> ================================= >> >> The base coordinator interface and the Python coordinator will live in >> “airflow.sdk.execution_time”. Other coordinators (for foreign >> >> languages) >> >> are registered through the existing Airflow provider mechanism. Each >> >> SDK >> >> provider declares its coordinator in its provider.yaml under a >> “coordinators” extension point. Both ProvidersManager (airflow-core) >> >> and >> >> ProvidersManagerTaskRuntime (task-sdk) discover coordinators through >> >> this >> >> extension point. This means adding a new language runtime requires >> >> only a >> >> provider package. No changes to Airflow Core are needed. >> >> The new JVM-based coordinator will live in the namespace >> “airflow.providers.sdk.java”. This is not the most accurate name >> (technically it should be “jvm” instead), but in practice most users >> >> will >> >> recognize it, and (from my understanding) other JVM language users >> >> (e.g. >> >> Kotlin, Scala) are already well-versed enough dealing with Java >> interoperability to understand “java” means JVM in this context. >> >> Writing DAGs in Java >> ==================== >> >> This is not strictly connected to AIP-72, but considered by us as a >> natural next step since we can now implement tasks in a foreign >> >> language. >> >> Being able to define the DAG in the same language as the task >> implementation is useful since writing Python, even if only with >> >> minimal >> >> syntax, is still a hurdle for those not already familiar with, or even >> allowed to run it. There are mainly three things we need on top of the >> >> task >> >> implementation interface: >> >> - DAG flags (e.g. schedule, max_active_tasks) >> - Task flags (e.g. trigger_rule, weight_rule) >> - Task dependencies >> >> A proof-of-concept implementation is included with other changes >> >> proposed >> >> elsewhere in this document. >> >> Lazy Consensus Topics >> ===================== >> >> We’re calling for lazy consensus for the following topics >> >> - A new “queue_to_sdk” configuration option to route tasks to a >> >> specific >> >> language SDK >> - A new coordinator layer in the SDK to route implementations at >> >> execution >> >> time. >> - New providers under airflow.providers.sdk to provide additional >> >> language >> >> support. >> - Develop the Go SDK to support the proposed model and a provider >> >> package >> >> for the coordinator. (Existing features stay as-is; no breaking >> >> changes.) >> >> - Add the new Java SDK and the corresponding provider package. >> >> TP >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail:[email protected] >> For additional commands, e-mail:[email protected] >> >> >> >>
