jason810496 opened a new issue, #67111:
URL: https://github.com/apache/airflow/issues/67111
### Background
In the AIP-108 dev@ thread ([2026-05-13 reply][maciej-reply]), Maciej
Obuchowski outlined what OpenLineage needs from the Java SDK:
- **Generic task lifecycle events** — when OL is enabled, every task
execution emits OL start / complete / failure events via the listener
framework, driven by the Python task runner. For Java tasks, the listener
calls still fire from the Python side around the Java subprocess (e.g.
near `on_task_instance_running` / `on_task_instance_success` in
`task-sdk/src/airflow/sdk/execution_time/task_runner.py:1196` and
`:1921`). No code change is required to keep this working for the
"task ran, task succeeded" signal.
- **Operator/hook-specific lineage data** — the part that currently relies
on Python operators/hooks being in-process and reading state from the
`TaskInstance` after `execute()`. This does **not** work for a Java task
because the user code runs in a JVM subprocess that the Python listener
cannot introspect.
Maciej's conclusion: v1 of the Java SDK does not need OL emission from
inside the Java task, but the IPC and Java-side API must **not block** a
future lineage interface from being added. Concretely:
> "able to send serialized data back from the task execution to Python;
> and an API in the Java SDK for users to be able to specify that data."
### What needs to happen
1. **Reserve a lineage channel on the coordinator IPC.** When the Java
subprocess returns task results to the supervisor, the protocol should
allow an optional serialized lineage payload alongside the existing
result message. The base `BaseCoordinator` interface needs to expose a
hook the supervisor calls with that payload (no-op by default).
2. **Expose a Java SDK API for users to declare lineage data.** Minimal
shape, mirroring how Python tasks can attach lineage to a `TaskInstance`
today. Exact API to be decided once the IPC channel exists, but it
should be:
- opt-in (no overhead for tasks that don't use it),
- language-idiomatic on the Java side,
- resolvable to whatever serialized form the Python listener expects.
3. **Wire the payload into the existing listener pipeline.** The Python
supervisor should forward the serialized lineage data into the same
`get_listener_manager().hook.*` call chain that already runs around
Java task execution, so OL providers don't need any Java-specific code
path.
4. **Document the v1 boundary clearly.** The coordinator user guide should
state that OL start/complete events fire for Java tasks today, but
Java-side lineage extraction is a follow-up.
### Acceptance criteria
- A user enabling OpenLineage sees start/complete events for Java stub
tasks the same way as Python tasks (no regression from current
behavior).
- The coordinator IPC has a documented optional lineage field; a Java
task that does not emit lineage produces exactly the same wire traffic
as today.
- A Java SDK user can attach lineage data from a task and see it land in
the Python supervisor as a serialized payload available to OL listeners.
- The OL provider's listener does not need a Java-specific branch — the
Java payload reaches it through the same listener hooks as Python.
- The coordinator user guide states the v1 boundary and links to this
issue for follow-up work.
### Context
- Dev@ thread:
<https://lists.apache.org/thread/gjot4bxj9kygj2fk76kx6tyg8s4hr057>
— Maciej Obuchowski reply on 2026-05-13 ("Generic task information and
specific lineage data ..."). Jarek Potiuk's prior message on the same
day pinged Maciej and Kacper for OL input.
- AIP-108 wiki:
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-108+Language+Task+SDK+and+the+Language+Coordinator+Layer>
- Originating PRs: apache/airflow#65956 (Java SDK), apache/airflow#65958
(Coordinator layer).
- Related work:
- #66543 — Java-based task and Dag-level callbacks. Callbacks may be
the mechanism (or share machinery) for shipping lineage data back to
Python.
- #66838 — Pluggable communication channels. The IPC reservation
in step 1 should be designed against whatever shape `BaseCoordinator`
settles on there.
- #66590 — Compatible protocol between coordinator and lang-SDK. The
optional lineage field needs to fit the forward-compat contract being
defined there.
[maciej-reply]:
https://lists.apache.org/thread/gjot4bxj9kygj2fk76kx6tyg8s4hr057
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]