hjtran opened a new pull request, #37556: URL: https://github.com/apache/beam/pull/37556
## Summary - Fix composite transform output registration to use all declared tags instead of lazily-populated `_pcolls` dict - Prevents missing outputs in pipeline proto when `DoOutputsTuple` subclasses haven't accessed all outputs at registration time - Add regression test for `DoOutputsTuple` subclass with `__or__` override ## Description PR #36220 changed composite output registration to iterate over `result._pcolls.items()`. However, `_pcolls` is lazily populated — PCollections are only added when accessed via `__getitem__`. This means unaccessed outputs are silently dropped from the composite's registered outputs. This breaks when a `DoOutputsTuple` subclass (e.g. one that overrides `__or__` to pipe to the main output) is returned from a composite's `expand()`. At registration time, only outputs that happened to be accessed are in `_pcolls`, so the main output may be missing from the pipeline proto. This causes disconnected edges in pipeline visualization. **Fix:** Iterate over all declared tags (`_main_tag` + `_tags`) and access each via `result[tag]` to trigger lazy creation, ensuring all outputs are registered. ## Test plan - [ ] New test `test_do_outputs_tuple_subclass_registers_all_outputs` verifies consumed PCollection appears in composite's proto outputs - [ ] Existing `test_multiple_outputs_composite_ptransform` updated to expect all 3 outputs (main + 2 tagged) - [ ] Full `pipeline_test.py` suite passes (63 passed, 2 skipped) 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
