hjtran opened a new pull request, #37556:
URL: https://github.com/apache/beam/pull/37556

   ## Summary
   - Fix composite transform output registration to use all declared tags 
instead of lazily-populated `_pcolls` dict
   - Prevents missing outputs in pipeline proto when `DoOutputsTuple` 
subclasses haven't accessed all outputs at registration time
   - Add regression test for `DoOutputsTuple` subclass with `__or__` override
   
   ## Description
   
   PR #36220 changed composite output registration to iterate over 
`result._pcolls.items()`. However, `_pcolls` is lazily populated — PCollections 
are only added when accessed via `__getitem__`. This means unaccessed outputs 
are silently dropped from the composite's registered outputs.
   
   This breaks when a `DoOutputsTuple` subclass (e.g. one that overrides 
`__or__` to pipe to the main output) is returned from a composite's `expand()`. 
At registration time, only outputs that happened to be accessed are in 
`_pcolls`, so the main output may be missing from the pipeline proto. This 
causes disconnected edges in pipeline visualization.
   
   **Fix:** Iterate over all declared tags (`_main_tag` + `_tags`) and access 
each via `result[tag]` to trigger lazy creation, ensuring all outputs are 
registered.
   
   ## Test plan
   - [ ] New test `test_do_outputs_tuple_subclass_registers_all_outputs` 
verifies consumed PCollection appears in composite's proto outputs
   - [ ] Existing `test_multiple_outputs_composite_ptransform` updated to 
expect all 3 outputs (main + 2 tagged)
   - [ ] Full `pipeline_test.py` suite passes (63 passed, 2 skipped)
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to