So then how does the proposal sound? Writing again here: PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection], Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple]
i.e. no arbitrary nesting when outputting from an expand On Tue, Mar 31, 2020 at 5:15 PM Robert Bradshaw <rober...@google.com> wrote: > On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik <lc...@google.com> wrote: > > > > It is important that composites know how things are named so that any > embedded payloads in the composite PTransform can reference the outputs > appropriately. > > Very good point. This is part of the cleanup to treat inputs and > outputs of PCollections as maps rather than lists generally across the > Python representations (which also relates to some of the ugliness > that Cham has been running into with cross-language). > > > On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <rober...@google.com> > wrote: > >> > >> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <sro...@google.com> wrote: > >> >>> > >> >>> * Don't allow arbitrary nestings returned during expansion, force > composite transforms to always provide an unambiguous name (either a tuple > with PCollections with unique tags or a dictionary with untagged > PCollections or a singular PCollection (Java and Go SDKs do this)). > >> >> > >> >> I believe that aligning with Java and Go would be the right way to > go here. I don't know if this would limit expressiveness. > >> > > >> > Yeah this sounds like a much more elegant way of handling this > situation. I would lean towards this limiting expressiveness because there > would be a limit to nesting, but I think that the trade-off with reducing > complexity is worth it. > >> > > >> > So in summary it could be: > >> > PTransform.expand: (...) -> Union[PValue, NamedTuple[str, > PCollection], Tuple[str, PCollection], Dict[str, PCollection], > DoOutputsTuple] > >> > > >> > With the expectation that (pseudo-code): > >> > a_transform = ATransform() > >> > > ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys() == > a_transform.outputs.keys() > >> > > >> > Since this changes the Python SDK composite transform API, what would > be the next steps for the community to come to a consensus on this? > >> > >> It seems here we're conflating the nesting of PValue results with the > >> nesting of composite operations. > >> > >> Both examples in the original post have PTransform nesting (a > >> composite) returning a flat tuple. This is completely orthogonal to > >> the idea of a PTransform returning a nested result (such as (pc1, > >> (pc2, pc3))) and forbidding the latter won't solve the former. > >> > >> Currently, with the exception of explicit names given for multi-output > >> ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ... > >> (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter > >> the nesting. We could do better, e.g. for the example above, label > >> them "0", "1.0", "1.1", or use the keys in the returned dict, but this > >> is separate from the idea of trying to relate the output tags of > >> composites to the output tags of their inner transforms. > >> > >> - Robert >