potiuk opened a new pull request, #68147: URL: https://github.com/apache/airflow/pull/68147
The TCP connection-ownership check added in #67781 only accepted the supervisor channel when the connecting peer belonged to the spawned process's *exact* PID. In the real Java SDK PROD e2e the JVM's loopback connection is not found under that single PID, so both the `comm` and `logs` channels are rejected, the task subprocess dies with `process exited with 1 before connecting`, and every Java task fails (e.g. `java_annotation_example.extract`). The Java SDK e2e suite is canary-only, so it did not run on #67781 — the breakage only surfaced in the nightly `main` runs (red since 2026-05-30). This widens the trust boundary to the child process **or any of its descendants**: a launcher (JVM, shell wrapper, or any runtime that forks a worker) legitimately connects back from a descendant rather than the launched PID. A process *outside* the spawned subtree is still rejected, so the hardening #67781 added is preserved. The ownership lookup is also retried briefly to absorb the race where a freshly established connection is not yet visible in `/proc`. Validated by the canary `Java SDK e2e tests with PROD image` job (forced on this PR via the `canary` label), plus unit coverage for the descendant-connection and retry paths. related: #67781 --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes — Claude Code (Opus 4.8) Generated-by: Claude Code (Opus 4.8) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
