uranusjr commented on code in PR #68603: URL: https://github.com/apache/airflow/pull/68603#discussion_r3449214946
########## java-sdk/README.md: ########## @@ -181,31 +181,151 @@ newlines, which does not work well in a Gradle properties file. credentials instead: `ASF_NEXUS_USERNAME`, `ASF_NEXUS_PASSWORD`, `SIGNING_KEY`, and `SIGNING_PASSWORD`. This is especially useful on e.g. CI. -## Technical Details +## Contributing + +The user implements a Java application containing task methods annotated (or +registered) with the SDK. The application is packaged as a bundle and placed +where Airflow can find it. + +When the Airflow supervisor identifies that a task should run with Java, it +launches the JVM application as a subprocess. The flow is: + +1. `JavaCoordinator.execute_task()` (Python) scans `jars_root`, builds the + classpath, and spawns `java -cp <jars> <MainClass> --comm=<host>:<port> + --logs=<host>:<port>`. +2. `Server.kt` connects to both sockets immediately on startup. +3. The supervisor sends a `StartupDetails` MessagePack message; the JVM reads + it, looks up the matching task by `dag_id` + `task_id`, and calls the + user's task method. +4. During execution the JVM sends requests to the supervisor (GetVariable, + GetConnection, GetXCom, SetXCom, etc.) and the supervisor responds. All + frames are a 4-byte big-endian length prefix followed by a MessagePack + payload. +5. On completion (or exception) the JVM sends a `TaskState` message and closes + the socket. The JVM process then exits. + +Log messages produced by the SDK (not by user code) are forwarded over the +`--logs` socket so the supervisor can append them to Airflow's log store. + +The wire protocol is defined in +`task-sdk/src/airflow/sdk/execution_time/schema/schema.json`. +`execution/Comm.kt` implements the framing layer. Adding a new message type +requires changes in **both** `schema.json` (Python side) and +`execution/Comm.kt` + `execution/Client.kt` (JVM side). + +See [Architectural Design Records](./adr) in the `adr` directory to learn more. + Review Comment: The gradle part is fine, the agent can read gradle configuration better than human anyway. I added a section on interaction with the JavaCoordinator. ########## .agents/skills/airflow-java-sdk/SKILL.md: ########## @@ -0,0 +1,86 @@ +--- +name: airflow-java-sdk +description: > + Guide for contributing to the Airflow Java SDK (AIP-108). Use this skill + whenever a contributor is working in the `java-sdk/` directory or on the Java + coordinator in `task-sdk/src/airflow/sdk/coordinators/java/` — whether they + want to add a feature, write tests, fix a bug, understand the architecture, or + prepare a PR. Trigger on phrases like "Java SDK", "JavaCoordinator", + "java-sdk", "annotation processor", "Builder.Task", "BundleBuilder", or + anything about running JVM tasks in Airflow. +--- + +<!-- SPDX-License-Identifier: Apache-2.0 + https://www.apache.org/licenses/LICENSE-2.0 --> + +# Airflow Java SDK contributor guide + +The Java SDK lets Airflow tasks execute JVM code (Java, Kotlin, or any JVM language). You are helping +a contributor work in one or both of these locations: + +- **`java-sdk/`** — the JVM-side library (Kotlin source, published to Maven) +- **`task-sdk/src/airflow/sdk/coordinators/java/`** — the Python coordinator that launches the JVM subprocess + Review Comment: Added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
