On Fri, Feb 6, 2026 at 2:36 PM Joey Tran <[email protected]> wrote: > > On Fri, Feb 6, 2026 at 4:43 PM Danny McCormick <[email protected]> > wrote: >> >> On Fri, Feb 6, 2026 at 4:22 PM Joey Tran <[email protected]> wrote: >>> >>> FWIW, much of the value of this proposal to me is the better readability >>> from not having to consider multiple versions of transforms and not having >>> to break up chains to extract main outputs. I appreciate though that we'd >>> be making a trade-off of readability of the "sad path" for readability of >>> the "happy path" >> >> >> Yeah, that makes sense; what do you think of the other alternative mentioned >> as an option for optimizing for both kinds of readability? Specifically, >> allowing: >> >> pcoll | Partition(...)['main'] | ChainedParDo() >> >> I guess the downside there is education (all pipeline authors need to know >> this is an option as opposed to only one expert transform author), but I'm >> curious if it is sufficient for your context. > > Is the suggestion here to implement `__getitem__` on PTransform/ParDo so a > particular pcollection can be specified? This would definitely be an > improvement from the current state. I think one further improvement would be > if we could specify the pcollection by attribute rather than by key/string, > so `Partition(...).main` instead, but that risks pcollection name and > ptransform method collisions. > > I'm still partial toward the other suggestions, particularly towards > implementing `PTransform.with_outputs`, but this is probably sufficient for > my context.
I'll admit that I'm actually not a fan of with_outputs(...). It's not very dry--I'd rather the consumer decide what it wants to consume by consuming it than have to also (redundantly) specify it on the producer. I think it dates back to trying to copy java where the return type needs to be a typed PValue. Were I to do it again, I would have such transforms return a dict or named tuple (if all outputs are meaningful) or an "augmented" PCollection (as has been proposed here) when they are auxiliary (and preferably leave the decision up to the DoFn implementor, not the caller). - Robert
