Aside from the points on WASM and Protobuf, are there specific points you are 
looking for?

I’m asking mostly to shed some back and forth ahead of time—easier to get all 
the requirements out first instead of needing multiple proposal-new 
requirements not covered-proposal-loop cycles.


> On 7 May 2026, at 23:41, Philippe Gagnon <[email protected]> wrote:
> 
> Hello dev list,
> 
> I think the protocol framing needs further discussion, because I don't think 
> the AIP captures what's actually being proposed.
> 
> The supervisor <> task protocol today is a private implementation detail. 
> AIP-108 promotes it to a public, cross-language contract without having been 
> designed for that role. IMO that deserves its own AIP, and it should come 
> before the Java SDK goes in, not after.
> 
> I did see a note about a protocol AIP once the coordinator lands (from TP), 
> but I'd flip the order. If the Java SDK ships first, it implicitly defines 
> the spec, and every later SDK has to match its accidental shape, or the spec 
> changes and the early SDKs get revised.
> 
> IMO we need further clarity on:
> 
> Transport layer: BaseCoordinator today defines what the execution model 
> should look like too broadly.
> 
> The base class defines the public interface, but it also implements behavior 
> that anchors the execution model (subprocess) and the communications 
> mechanism (IPC). That's already inconsistent with the established mechanism: 
> the existing Python task execution path uses socketpairs over fd 0, but the 
> coordinator bridge uses TCP on 127.0.0.1 with random ports. More broadly, it 
> forecloses implementations that don't fit that model, i.e. shared memory, 
> in-process execution, etc. For example, thi would make building something 
> like a WASM coordinator with no subprocess overhead at all impractical.
> 
> The wire format: We currently use hand-framed msgpack, but other alternatives 
> like protobuf deserve a serious look, as Jens has already raised. The 
> forward-compat rules are exactly what protobuf gives you in the spec, and 
> codegen removes per-language hand-written framing. "We already use msgpack" 
> is a weak argument here, imo, because we are already significantly widening 
> the scope for which the original protocol was designed, and that should be 
> formalized.
> 
> I also have some other nits but I need to digest everything a bit further 
> before raising them intelligibly. 🙂
> 
> BR,
> 
> ✨ Philippe Gagnon
> Meet for 30 mins 📅 <https://calendar.app.google/5qzgD9SadybUvSCv8>
> 
> 
> On Wed, May 6, 2026 at 5:15 AM Zhe-You(Jason) Liu <[email protected] 
> <mailto:[email protected]>> wrote:
>> Hi Stefan and all,
>> 
>> We've updated the ADRs [1], which now include:
>> - All the points Jens raised (lifecycle, protocol, etc.)
>> - All the points Stefan raised (forward-compatibility,
>> backward-compatibility for the language runtime msgpack schema, etc.)
>> 
>> Additionally, we've settled on placing the Coordinator subclasses in
>> `sdk.coordinators.java` instead of overloading the Provider (which should
>> only expose features intended for Dag authors). See [2] for details.
>> Thanks to Aritra, Ash, and Elad for raising these concerns and helping us
>> settle on the right direction.
>> 
>> [1]
>> https://github.com/apache/airflow/pull/65956/changes#diff-0775f49c0e12a89d7dd25dd8908ea932bd063d6b29838786272cf8cdc7f8ed3b
>> [2] https://github.com/apache/airflow/pull/65958#issuecomment-4386290759
>> 
>> Best,
>> Jason
>> 
>> 
>> On Wed, May 6, 2026 at 10:28 AM Tzu-ping Chung via dev <
>> [email protected] <mailto:[email protected]>> wrote:
>> 
>> > Thanks Stefan, all great points.
>> >
>> > Since we are likely going to have more language SDKs proposed after the
>> > coordinator layer is defined, I plan to have the comm protocol between the
>> > task runner and the child process more formally defined. This will likely
>> > be a separate AIP after the Java SDK and coordinator go in.
>> >
>> > Before that, let’s update the ADR to describe what the Java side expects
>> > from the task runner (Python).
>> >
>> > TP
>> >
>> >
>> > > On 5 May 2026, at 05:26, Stefan Wang <[email protected] 
>> > > <mailto:[email protected]>> wrote:
>> > >
>> > > Thanks Jason,
>> > >
>> > >  +1 from my side.
>> > >
>> > > I also spent some more time and checked the feature/java-sdk branch and
>> > have a couple of suggestions to firm up the
>> > > forward-compat story before more language SDKs land.
>> >
>> >
>> > >
>> > >  What's already there is already good: TaskSdkFrames.kt configures the
>> > Jackson mapper
>> >
>> > >  with FAIL_ON_UNKNOWN_PROPERTIES=false, so additive Core changes
>> >
>> >
>> > >  (adding a field to StartupDetails, e.g.) are absorbed silently on the
>> >
>> >
>> > >  Java side. Python side via msgspec/Pydantic is forward-compat by
>> >
>> >
>> > >  default. The codec-level defense is in place.
>> >
>> >
>> > >
>> > >  What I'd suggest documenting:
>> >
>> >
>> > >
>> > >  1. Promote that codec setting to an explicit contract in ADR 0003.
>> >
>> >
>> > >     Right now it's a one-line implementation detail. A future SDK
>> > >     author (Rust, additional Go work, future contributor swapping the
>> >
>> >
>> > >     mapper) could miss it. Worth stating the contract clearly:
>> >
>> >
>> > >     "the Coordinator IPC schema is forward-compatible. SDKs MUST
>> >
>> >
>> > >     configure their decoder to ignore unknown fields. Adding fields
>> >
>> >
>> > >     to existing messages is non-breaking; renames and type changes
>> >
>> >
>> > >     are breaking."
>> > >
>> > >     I'm raising this because I've watched the analogous trap fire in
>> >
>> >
>> > >     a Python-server / generated-Java-client setup: the Java codegen
>> > >     tool there emits its own pre-Jackson whitelist check
>> >
>> >
>> > >     (validateJsonElement throwing IllegalArgumentException on unknown
>> >
>> >
>> > >     fields, regardless of how the downstream Jackson mapper is
>> >
>> >
>> > >     configured). One additive field change in the server response
>> >
>> >
>> > >     broke every consumer until clients re-generated. The "lenient
>> >
>> >
>> > >     codec" defense only works if every binding uses a lenient codec
>> >
>> >
>> > >     — and that's a property worth stating explicitly so future SDK
>> >
>> >
>> > >     authors know what they're committing to.
>> >
>> >
>> > >
>> > >  2. The harder cases are worth a sentence too. FAIL_ON_UNKNOWN=false
>> >
>> >
>> > >     covers additions, but in Comms.kt several StartupDetails fields
>> > >     are `lateinit var` — if Core removes one of those, deserialize
>> >
>> >
>> > >     succeeds silently and the first task-side access throws
>> >
>> >
>> > >     UninitializedPropertyAccessException at runtime. A sentence in
>> >
>> >
>> > >     the ADR distinguishing "additive (non-breaking)" from "rename /
>> >
>> >
>> > >     type-change / required-field removal (breaking, deprecation
>> >
>> >
>> > >     cycle required)" gives SDK maintainers a clear rule.
>> >
>> >
>> > >
>> > >  3. Worth considering a contract test that simulates Core-ahead-of-
>> >
>> >
>> > >     SDK on the IPC layer specifically. SerializationCompatibilityTest
>> > >     covers DAG JSON well; CommsTest exercises the current protocol.
>> >
>> >
>> > >     A small fixture-based test that feeds the Java decoder a frame
>> >
>> >
>> > >     with an unknown field / missing optional field / null in an
>> >
>> >
>> > >     optional position would catch codec-config regressions before
>> >
>> >
>> > >     they hit users.
>> >
>> >
>> > >
>> > >  None of this blocks the AIP — codec config + tests can land in a
>> >
>> >
>> > >  follow-up. Mostly want to lock the contract in writing before more
>> > >  language SDKs make assumptions that diverge.
>> >
>> >
>> > >
>> > >
>> > >  Thank you for driving this! Appreciate the effort.
>> >
>> > >  Stefan
>> > >
>> > >
>> > >> On May 4, 2026, at 2:33 AM, Aritra Basu <[email protected] 
>> > >> <mailto:[email protected]>>
>> > wrote:
>> > >>
>> > >> Hey Jason,
>> > >>
>> > >> I'd forgotten to comment on it, I'd taken a look at the proposal and it
>> > >> looks good to me! Looking forward to see it go in! Was looking at the
>> > pure
>> > >> java dags code and it looked good. Will be taking a deeper look into it
>> > >> soon!
>> > >>
>> > >> --
>> > >> Regards,
>> > >> Aritra Basu
>> > >>
>> > >> On Mon, 4 May 2026, 1:30 pm Zhe-You(Jason) Liu, <[email protected] 
>> > >> <mailto:[email protected]>>
>> > wrote:
>> > >>
>> > >>> Hi everyone,
>> > >>>
>> > >>> Hoping to get a few more eyes on the open "AIP-108: Java Task SDK and
>> > the
>> > >>> Language Coordinator Layer" and ADRs before we move forward.
>> > >>> Please take a look and share any thoughts when you have a moment.
>> > >>>
>> > >>> - AIP:
>> > >>>
>> > >>>
>> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-108+Java+Task+SDK+and+the+Language+Coordinator+Layer
>> > >>> - ADRs:
>> > >>>
>> > >>>
>> > https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0
>> > >>> - PR for adding Coordinator:
>> > https://github.com/apache/airflow/pull/65958
>> > >>> - PR for adding Java-SDK: https://github.com/apache/airflow/pull/65956
>> > >>>
>> > >>> Even a short comment or a +1 on AIP helps us understand where the
>> > community
>> > >>> stands.
>> > >>> Thank you!
>> > >>>
>> > >>> Best,
>> > >>> Jason
>> > >>>
>> > >>> On Tue, Apr 28, 2026 at 10:58 PM Jarek Potiuk <[email protected] 
>> > >>> <mailto:[email protected]>>
>> > wrote:
>> > >>>
>> > >>>> Oh super - nice. I am definitely going to take a look (at both) :D
>> > >>>>
>> > >>>> On Tue, Apr 28, 2026 at 9:55 AM Tzu-ping Chung via dev <
>> > >>>> [email protected] <mailto:[email protected]>> wrote:
>> > >>>>
>> > >>>>> Hi all,
>> > >>>>>
>> > >>>>> I’ve merged the ADRs and the previous LC message (in the other
>> > thread)
>> > >>>>> into a document formatted as an AIP:
>> > >>>>>
>> > >>>>> AIP-108 Java Task SDK and the Language Coordinator Layer - Airflow -
>> > >>>>> Apache Software Foundation <
>> > >>> https://cwiki.apache.org/confluence/x/pY4mGQ>
>> > >>>>> cwiki.apache.org <http://cwiki.apache.org/> 
>> > >>>>> <https://cwiki.apache.org/confluence/x/pY4mGQ>
>> > >>>>> [image: favicon.ico] <https://cwiki.apache.org/confluence/x/pY4mGQ>
>> > >>>>> <https://cwiki.apache.org/confluence/x/pY4mGQ>
>> > >>>>>
>> > >>>>>
>> > >>>>> Various sections are taken mostly directly from the other documents,
>> > so
>> > >>>>> you can probably glance over them if you’ve already read them. If you
>> > >>>>> haven’t, the AIP document would be a good place to start since it
>> > >>> removes
>> > >>>>> some more detailed descriptions and focuses on the high level
>> > >>> interfaces.
>> > >>>>> More details are still available in the ADR documents included in the
>> > >>> PRs
>> > >>>>> Jason opened.
>> > >>>>>
>> > >>>>> TP
>> > >>>>>
>> > >>>>>
>> > >>>>> On 28 Apr 2026, at 11:12, Zhe-You(Jason) Liu <[email protected] 
>> > >>>>> <mailto:[email protected]>>
>> > >>> wrote:
>> > >>>>>
>> > >>>>> Hi Jens,
>> > >>>>>
>> > >>>>> The ADRs are now available here [1] , hope that helps clarify some of
>> > >>> your
>> > >>>>> concerns.
>> > >>>>>
>> > >>>>> As I understood Java is static compiled JAR files, no on-the-fly
>> > compile
>> > >>>>>
>> > >>>>> from Java source tree (correct?)
>> > >>>>>
>> > >>>>> That's correct -- the user will compile themselves and set the
>> > `[java]
>> > >>>>> bundles_folder` config to point to the directory of those JAR files.
>> > >>>>>
>> > >>>>> so actually the Dag parsing concept then is quite "static" and once
>> > >>>>>
>> > >>>>> generated actually no need to re-parse the Dag in Java mode?
>> > >>>>>
>> > >>>>> Still if static JAR deployed how does the deploy lifecacly look like?
>> > >>>>>
>> > >>>>> Would you need to restart with new deploy the Dag Parsr and
>> > respective
>> > >>>>> workers?
>> > >>>>>
>> > >>>>> The lifecycle for dag-parsing will be the same as how the current
>> > >>>>> `DagFileProcessorProcess` acts. The coordinator comes into play
>> > before
>> > >>> we
>> > >>>>> start the actual parse file entrypoint [2]. Regardless of what
>> > language
>> > >>>>> the
>> > >>>>> parse file subprocess is implemented in, as long as it returns a
>> > valid
>> > >>>>> serialized Dag JSON in msgpack over IPC, the behavior will remain the
>> > >>>>> same.
>> > >>>>> So there is no need to restart the `airflow dag-processor`, and the
>> > >>>>> `airflow worker` will not be involved in dag processing at all.
>> > >>>>>
>> > >>>>> Is it really realistic that a "LocalExecutor" needs to be supported
>> > or
>> > >>>>>
>> > >>>>> can we limit it to e-g- only remote executors to reduce coupling and
>> > >>>>> complexity of operating the core?
>> > >>>>>
>> > >>>>> The coordinator is the interface that decides how we want to launch
>> > the
>> > >>>>> subprocess for both dag-processing and workload-execution. This
>> > means it
>> > >>>>> will support **any** executor out of the box, as we integrate the
>> > >>>>> coordinator at the TaskSDK level for workload-execution [3].
>> > >>>>>
>> > >>>>> What overhead does the "Coordinator layer" generate compared to a
>> > Java
>> > >>>>>
>> > >>>>> specific supervisor implementation?
>> > >>>>>
>> > >>>>> The "Coordinator layer" is the interface for Airflow-Core to interact
>> > >>> with
>> > >>>>> the target language subprocess. We still need a Java-specific
>> > supervisor
>> > >>>>> implementation, which is the first PR I mentioned in another thread
>> > --
>> > >>>>> Java
>> > >>>>> SDK [4].
>> > >>>>>
>> > >>>>> Is it not only a new / additional process but also IPC involved then.
>> > >>> And
>> > >>>>>
>> > >>>>> at least I saw also performance problems e.g. using very large XComs
>> > >>> where
>> > >>>>> even heartbeats are lost due to long running IPC
>> > >>>>>
>> > >>>>> I would consider this out of scope of the "Java SDK and the
>> > Coordinator
>> > >>>>> Layer" AIP, as the current Python-native TaskSDK supervisor will
>> > >>> encounter
>> > >>>>> the same issue. The multi-language support here follows the same
>> > >>> protocol
>> > >>>>> that the current TaskSDK uses.
>> > >>>>>
>> > >>>>> [1]
>> > >>>>>
>> > >>>>>
>> > >>>
>> > https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0
>> > >>>>> [2]
>> > >>>>>
>> > >>>>>
>> > >>>
>> > https://github.com/apache/airflow/pull/65958/changes#diff-564fd0a8fbe4cc47864a8043fcc1389b33120c88bb35852b26f45c36b902f70bR540-R573
>> > >>>>> [3]
>> > >>>>>
>> > >>>>>
>> > >>>
>> > https://github.com/apache/airflow/pull/65958/changes#diff-5bef10ab2956abf7360dbf9b509b6e1113407874d24abcc1b276475051f13abfR1991-R2001
>> > >>>>> [4] https://github.com/apache/airflow/pull/65956
>> > >>>>>
>> > >>>>> Thanks.
>> > >>>>>
>> > >>>>> Best,
>> > >>>>> Jason
>> > >>>>>
>> > >>>>> Best,
>> > >>>>> Jason
>> > >>>>>
>> > >>>>> On Tue, Apr 28, 2026 at 3:03 AM Jens Scheffler <[email protected] 
>> > >>>>> <mailto:[email protected]>>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>> Thanks TP for raising this!
>> > >>>>>
>> > >>>>> I would need a sleep-over the block of information in the described
>> > >>>>> details and might have some detail questions just to ensure I
>> > understood
>> > >>>>> right. So in a ADR or AIP document might be better to comment than
>> > in an
>> > >>>>> email thread.
>> > >>>>>
>> > >>>>> Things that jump into my head but there would be more coming thinking
>> > >>>>> about it:
>> > >>>>>
>> > >>>>> * As I understood Java is static compiled JAR files, no on-the-fly
>> > >>>>>  compile from Java source tree (correct?) - in this case also the
>> > >>>>>  Dags are "static until re-deploy" - so actually the Dag parsing
>> > >>>>>  concept then is quite "static" and once generated actually no need
>> > >>>>>  to re-parse the Dag in Java mode?
>> > >>>>> * Still if static JAR deployed how does the deploy lifecacly look
>> > >>>>>  like? Would you need to restart with new deploy the Dag Parsr and
>> > >>>>>  respective workers?
>> > >>>>> * Is it really realistic that a "LocalExecutor" needs to be supported
>> > >>>>>  or can we limit it to e-g- only remote executors to reduce coupling
>> > >>>>>  and complexity of operating the core?
>> > >>>>> * What overhead does the "Coordinator layer" generate compared to a
>> > >>>>>  Java specific supervisor implementation? Is it not only a new /
>> > >>>>>  additional process but also IPC involved then. And at least I saw
>> > >>>>>  also performance problems e.g. using very large XComs where even
>> > >>>>>  heartbeats are lost due to long running IPC
>> > >>>>>  (https://github.com/apache/airflow/issues/64628)
>> > >>>>> * (There might be more coming :-) )
>> > >>>>>
>> > >>>>> Jens
>> > >>>>>
>> > >>>>> P.S.: Questions here also do not mean rejection but like to
>> > understand
>> > >>>>> which complexity and overhead we have adding all this.
>> > >>>>>
>> > >>>>> On 27.04.26 15:21, Jarek Potiuk wrote:
>> > >>>>>
>> > >>>>> Also I would like to point out one thing.
>> > >>>>>
>> > >>>>> This should not be `LAZY CONSENSUS` just yet, this is quite a big
>> > thing
>> > >>>>>
>> > >>>>> to
>> > >>>>>
>> > >>>>> discuss. I missed the subject already had it.
>> > >>>>>
>> > >>>>> At this stage this is really a discussion (I renamed the thread.
>> > Because
>> > >>>>> ... we have never discussed it before.
>> > >>>>>
>> > >>>>> LAZY CONSENSUS should be really called for after initial discussion
>> > (on
>> > >>>>> devlist) points to us actually reaching the consensus.
>> > >>>>>
>> > >>>>> While this one is unlikely to cause much controversy (I think),
>> > >>>>>
>> > >>>>> sufficient
>> > >>>>>
>> > >>>>> time for people to discuss and digest it before calling for lazy
>> > >>>>>
>> > >>>>> consensus
>> > >>>>>
>> > >>>>> is a necessary prerequisite. We simply need time to build consensus.
>> > >>>>>
>> > >>>>> We have a few processes where we build a "general" consensus first,
>> > and
>> > >>>>> then we apply it only to particular cases (such as new providers).
>> > But
>> > >>> in
>> > >>>>> most cases when we have a "big" thing to discuss, we need to build
>> > >>>>> consensus on devlist first. While in some cases people discussed
>> > things
>> > >>>>> off-list and came to some conclusions (whch is perfectly fine) -
>> > >>> bringing
>> > >>>>> it to the list as a consensus, where we do not know if we achieved it
>> > >>>>>
>> > >>>>> yet,
>> > >>>>>
>> > >>>>> is - I think - a bit premature.
>> > >>>>>
>> > >>>>> See the lazy consensus explanation [1] and "consenesus building [2]
>> > >>>>>
>> > >>>>> [1] Lazy consensus -
>> > >>>>>
>> > >>>>>
>> > >>>
>> > https://community.apache.org/committers/decisionMaking.html#lazy-consensus
>> > >>>>>
>> > >>>>> [2]  Consensus building -
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>
>> > https://community.apache.org/committers/decisionMaking.html#consensus-building
>> > >>>>>
>> > >>>>>
>> > >>>>> J.
>> > >>>>>
>> > >>>>> On Mon, Apr 27, 2026 at 1:27 PM Aritra Basu<[email protected] 
>> > >>>>> <mailto:[email protected]>
>> > >
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>> Hey TP,
>> > >>>>>
>> > >>>>> Overall +1, This is quite an interesting implementation. A couple
>> > >>>>> questions, is provider the right place for the coordinator? Don't
>> > have
>> > >>>>> strong opinions or alternatives, but I am curious.
>> > >>>>>
>> > >>>>> Also for the parser wanted to understand a bit better how it works? I
>> > >>>>>
>> > >>>>> tried
>> > >>>>>
>> > >>>>> going through the SDK but wasn't able to fully understand it. Also
>> > +1 to
>> > >>>>> Jarek's recommendation for documentation.
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> --
>> > >>>>> Regards,
>> > >>>>> Aritra Basu
>> > >>>>>
>> > >>>>> On Mon, 27 Apr 2026, 11:39 am Tzu-ping Chung via dev, <
>> > >>>>> [email protected] <mailto:[email protected]>> wrote:
>> > >>>>>
>> > >>>>> Hi all,
>> > >>>>>
>> > >>>>> As mentioned in the latest dev call, we have been developing a Java
>> > SDK
>> > >>>>> with changes to Airflow in a separate fork[1]. We plan to start
>> > merging
>> > >>>>>
>> > >>>>> the
>> > >>>>>
>> > >>>>> Java SDK work back into the OSS repository.
>> > >>>>>
>> > >>>>> We see this as a natural step following initial work in AIP-72[2],
>> > >>>>>
>> > >>>>> which
>> > >>>>>
>> > >>>>> created “a clean language agnostic interface for task execution, with
>> > >>>>> support for multiple language bindings” (quoted from the proposal).
>> > >>>>>
>> > >>>>> The Java SDK also uses Ash’s addition of @task.stub[3] for the Go
>> > SDK,
>> > >>>>>
>> > >>>>> to
>> > >>>>>
>> > >>>>> declare a task in a DAG to be “implemented elsewhere” (not in the
>> > >>>>>
>> > >>>>> annotated
>> > >>>>>
>> > >>>>> function). Similar to the Go SDK, we also created a Java library that
>> > >>>>>
>> > >>>>> users
>> > >>>>>
>> > >>>>> can use to write task implementations for Airflow to execute at
>> > >>>>>
>> > >>>>> runtime.
>> > >>>>>
>> > >>>>>
>> > >>>>> [1]:https://github.com/astronomer/airflow/tree/feature/java-all
>> > >>>>> [2]:https://cwiki.apache.org/confluence/x/xgmTEg
>> > >>>>> [3]:https://github.com/apache/airflow/pull/56055
>> > >>>>>
>> > >>>>> The user-facing syntax for a stub task would be the same as
>> > implemented
>> > >>>>>
>> > >>>>> by
>> > >>>>>
>> > >>>>> the Go SDK:
>> > >>>>>
>> > >>>>>   @task.stub(queue="java-tasks")
>> > >>>>>   def my_task(): ...
>> > >>>>>
>> > >>>>> With a new configuration option to map tasks in a pool to be executed
>> > >>>>>
>> > >>>>> by
>> > >>>>>
>> > >>>>> a
>> > >>>>>
>> > >>>>> specific SDK:
>> > >>>>>
>> > >>>>>   [sdk]
>> > >>>>>   queue_to_sdk = {"java-tasks": "java"}
>> > >>>>>
>> > >>>>> The configuration is needed for some executors the Go SDK currently
>> > >>>>>
>> > >>>>> does
>> > >>>>>
>> > >>>>> not support. The Go SDK currently relies on each executor worker
>> > >>>>>
>> > >>>>> process
>> > >>>>>
>> > >>>>> to
>> > >>>>>
>> > >>>>> specify which queues they listen to, but this is not always viable,
>> > >>>>>
>> > >>>>> since
>> > >>>>>
>> > >>>>> some executors—LocalExecutor, for example—do not have the concept of
>> > >>>>>
>> > >>>>> worker
>> > >>>>>
>> > >>>>> processes.
>> > >>>>>
>> > >>>>> The Coordinator Layer
>> > >>>>> =====================
>> > >>>>>
>> > >>>>> When the Go SDK was implemented, it left out Runtime Airflow plugins
>> > >>>>>
>> > >>>>> as a
>> > >>>>>
>> > >>>>> future topic. This includes custom XCom backends, secrets backends
>> > >>>>>
>> > >>>>> lookup
>> > >>>>>
>> > >>>>> for connections and variables, etc. These components are implemented
>> > in
>> > >>>>> Python, and a Java task cannot easily use the feature unless we also
>> > >>>>> implement the lookup logic in Java. We don’t want to do that since it
>> > >>>>> introduces significant overhead to writing plugins, and the overhead
>> > >>>>> multiplies with each new language SDK.
>> > >>>>>
>> > >>>>> Fortunately, the current execution-time task runner already uses a
>> > >>>>> two-layer design. When an executor wants to run a task, it starts a
>> > >>>>> (Python) task runner process that talks to Airflow Core through the
>> > >>>>> Execution API, and *forks* another (Python) process, which talks to
>> > the
>> > >>>>> task runner through TCP, to run the actual task code. Airflow plugins
>> > >>>>> simply go into the task runner process.
>> > >>>>>
>> > >>>>> This design works well for us since it keeps all the Airflow plugins
>> > in
>> > >>>>> Python. The only thing missing is an abstraction for the task runner
>> > >>>>> process to run tasks in any language. We are calling this new layer
>> > the
>> > >>>>> **Coordinator**.
>> > >>>>>
>> > >>>>> When a DAG bundle is loaded, it not only tells Airflow how to find
>> > the
>> > >>>>> DAGs (and the tasks in them), but also how to *run* each task.
>> > Current
>> > >>>>> Python tasks use the Python Coordinator, running tasks by forking as
>> > >>>>> previously described. A new JVM Coordinator will instruct the task
>> > >>>>>
>> > >>>>> runner
>> > >>>>>
>> > >>>>> how to run tasks packaged in JAR files.
>> > >>>>>
>> > >>>>> Each coordinator implements a base interface (BaseRuntimeCoordinator)
>> > >>>>>
>> > >>>>> that
>> > >>>>>
>> > >>>>> handles three concerns:
>> > >>>>>
>> > >>>>> - Discovery: determining whether a given file belongs to this
>> > >>>>>
>> > >>>>> coordinator
>> > >>>>>
>> > >>>>> (e.g. JAR files for Java).
>> > >>>>> - DAG parsing: returning a runtime-specific subprocess command to
>> > parse
>> > >>>>> DAG files in the target language.
>> > >>>>> - Task execution: returning a runtime-specific subprocess command to
>> > >>>>> execute tasks in the target runtime.
>> > >>>>>
>> > >>>>> The base class owns the full bridge lifecycle—TCP servers, subprocess
>> > >>>>> management, and cleanup—so language providers only need to implement
>> > >>>>>
>> > >>>>> these
>> > >>>>>
>> > >>>>> three methods.
>> > >>>>>
>> > >>>>> The coordinator translates a DagFileParseRequest (for DAG parsing)
>> > and
>> > >>>>> StartupDetails (for Task execution) data model (as declared in
>> > Airflow)
>> > >>>>> into the appropriate commands for the target runtime. For example, a
>> > >>>>>
>> > >>>>> “java
>> > >>>>>
>> > >>>>> -classpath ... /path/to/MainClass ...” subprocess command that points
>> > >>>>>
>> > >>>>> to
>> > >>>>>
>> > >>>>> the correct JAR file and main class in this case.
>> > >>>>>
>> > >>>>> Coordinators as Airflow Providers
>> > >>>>> =================================
>> > >>>>>
>> > >>>>> The base coordinator interface and the Python coordinator will live
>> > in
>> > >>>>> “airflow.sdk.execution_time”. Other coordinators (for foreign
>> > >>>>>
>> > >>>>> languages)
>> > >>>>>
>> > >>>>> are registered through the existing Airflow provider mechanism. Each
>> > >>>>>
>> > >>>>> SDK
>> > >>>>>
>> > >>>>> provider declares its coordinator in its provider.yaml under a
>> > >>>>> “coordinators” extension point. Both ProvidersManager (airflow-core)
>> > >>>>>
>> > >>>>> and
>> > >>>>>
>> > >>>>> ProvidersManagerTaskRuntime (task-sdk) discover coordinators through
>> > >>>>>
>> > >>>>> this
>> > >>>>>
>> > >>>>> extension point. This means adding a new language runtime requires
>> > >>>>>
>> > >>>>> only a
>> > >>>>>
>> > >>>>> provider package. No changes to Airflow Core are needed.
>> > >>>>>
>> > >>>>> The new JVM-based coordinator will live in the namespace
>> > >>>>> “airflow.providers.sdk.java”. This is not the most accurate name
>> > >>>>> (technically it should be “jvm” instead), but in practice most users
>> > >>>>>
>> > >>>>> will
>> > >>>>>
>> > >>>>> recognize it, and (from my understanding) other JVM language users
>> > >>>>>
>> > >>>>> (e.g.
>> > >>>>>
>> > >>>>> Kotlin, Scala) are already well-versed enough dealing with Java
>> > >>>>> interoperability to understand “java” means JVM in this context.
>> > >>>>>
>> > >>>>> Writing DAGs in Java
>> > >>>>> ====================
>> > >>>>>
>> > >>>>> This is not strictly connected to AIP-72, but considered by us as a
>> > >>>>> natural next step since we can now implement tasks in a foreign
>> > >>>>>
>> > >>>>> language.
>> > >>>>>
>> > >>>>> Being able to define the DAG in the same language as the task
>> > >>>>> implementation is useful since writing Python, even if only with
>> > >>>>>
>> > >>>>> minimal
>> > >>>>>
>> > >>>>> syntax, is still a hurdle for those not already familiar with, or
>> > even
>> > >>>>> allowed to run it. There are mainly three things we need on top of
>> > the
>> > >>>>>
>> > >>>>> task
>> > >>>>>
>> > >>>>> implementation interface:
>> > >>>>>
>> > >>>>> - DAG flags (e.g. schedule, max_active_tasks)
>> > >>>>> - Task flags (e.g. trigger_rule, weight_rule)
>> > >>>>> - Task dependencies
>> > >>>>>
>> > >>>>> A proof-of-concept implementation is included with other changes
>> > >>>>>
>> > >>>>> proposed
>> > >>>>>
>> > >>>>> elsewhere in this document.
>> > >>>>>
>> > >>>>> Lazy Consensus Topics
>> > >>>>> =====================
>> > >>>>>
>> > >>>>> We’re calling for lazy consensus for the following topics
>> > >>>>>
>> > >>>>> - A new “queue_to_sdk” configuration option to route tasks to a
>> > >>>>>
>> > >>>>> specific
>> > >>>>>
>> > >>>>> language SDK
>> > >>>>> - A new coordinator layer in the SDK to route implementations at
>> > >>>>>
>> > >>>>> execution
>> > >>>>>
>> > >>>>> time.
>> > >>>>> - New providers under airflow.providers.sdk to provide additional
>> > >>>>>
>> > >>>>> language
>> > >>>>>
>> > >>>>> support.
>> > >>>>> - Develop the Go SDK to support the proposed model and a provider
>> > >>>>>
>> > >>>>> package
>> > >>>>>
>> > >>>>> for the coordinator. (Existing features stay as-is; no breaking
>> > >>>>>
>> > >>>>> changes.)
>> > >>>>>
>> > >>>>> - Add the new Java SDK and the corresponding provider package.
>> > >>>>>
>> > >>>>> TP
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> ---------------------------------------------------------------------
>> > >>>>> To unsubscribe, e-mail:[email protected] 
>> > >>>>> <mailto:e-mail%[email protected]>
>> > >>>>> For additional commands, e-mail:[email protected] 
>> > >>>>> <mailto:e-mail%[email protected]>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: [email protected] 
>> > > <mailto:[email protected]>
>> > > For additional commands, e-mail: [email protected] 
>> > > <mailto:[email protected]>
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected] 
>> > <mailto:[email protected]>
>> > For additional commands, e-mail: [email protected] 
>> > <mailto:[email protected]>
>> >
>> >

Reply via email to