tkaymak opened a new pull request, #39088:
URL: https://github.com/apache/beam/pull/39088

   Follow-up to #38966. The first 
`beam_PostCommit_Python_Xlang_Messaging_Direct` runs (the scheduled master run 
and a trigger-only retrigger in #39086) **failed deterministically** on 
`CrossLanguageMqttIOTest::test_xlang_mqtt_read_write_streaming` in both py310 
and py314.
   
   ### Root cause
   The unbounded read/write pipeline was launched with a **blocking** `p.run()` 
(`TestPipeline` defaults to `blocking=True`, so `run()` calls 
`wait_until_finish`). For a never-terminating streaming pipeline this means the 
observe-then-cancel logic after `p.run()` is effectively dead code — the job 
runs unattended until it dies. In CI it died with the worker data plane 
reporting `Stream removed (Socket closed)` and the state stream 
`DEADLINE_EXCEEDED`.
   
   The test also constructed a fresh `PipelineOptions([...])`, discarding the 
harness-provided test pipeline options — the point @Abacn raised in review 
(#38966).
   
   ### Fix
   - **Amend** the harness `TestPipeline` options instead of discarding them: 
enable streaming and run **non-blocking**, so the subscriber can collect the 
relayed records and the pipeline is cancelled cleanly.
   - **Keep** targeting the Prism portable runner. `SwitchingDirectRunner` 
explicitly disables its Prism delegation for pipelines containing external 
(cross-language) transforms (`runners/direct/direct_runner.py`) and falls back 
to `BundleBasedDirectRunner`, which cannot execute an unbounded read — so for 
this xlang streaming case the runner override is still required. (Re: @Abacn's 
note that the direct runner now defaults to Prism — that delegation is skipped 
for xlang pipelines.)
   
   The trigger-file bump runs the PostCommit against this PR (checked out as 
the PR merge ref) so the fix is validated in CI.
   
   Note: I couldn't fully reproduce the end-to-end pass locally — the Java SDK 
harness container needed by the xlang worker hits a docker image-resolution 
quirk in my sandbox that doesn't occur on the CI runners — so I'm relying on 
this PR's PostCommit for end-to-end validation.
   
   R: @Abacn


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to