I'm -1 on this, it adds additional restrictions and I don't see what this buys us. (In particular, it doesn't address the original issue.)
On Wed, Apr 1, 2020 at 12:41 PM Sam Rohde <sro...@google.com> wrote: > So then how does the proposal sound? > > Writing again here: > PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection], > Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple] > > i.e. no arbitrary nesting when outputting from an expand > > On Tue, Mar 31, 2020 at 5:15 PM Robert Bradshaw <rober...@google.com> > wrote: > >> On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik <lc...@google.com> wrote: >> > >> > It is important that composites know how things are named so that any >> embedded payloads in the composite PTransform can reference the outputs >> appropriately. >> >> Very good point. This is part of the cleanup to treat inputs and >> outputs of PCollections as maps rather than lists generally across the >> Python representations (which also relates to some of the ugliness >> that Cham has been running into with cross-language). >> >> > On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >> >> >> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <sro...@google.com> wrote: >> >> >>> >> >> >>> * Don't allow arbitrary nestings returned during expansion, force >> composite transforms to always provide an unambiguous name (either a tuple >> with PCollections with unique tags or a dictionary with untagged >> PCollections or a singular PCollection (Java and Go SDKs do this)). >> >> >> >> >> >> I believe that aligning with Java and Go would be the right way to >> go here. I don't know if this would limit expressiveness. >> >> > >> >> > Yeah this sounds like a much more elegant way of handling this >> situation. I would lean towards this limiting expressiveness because there >> would be a limit to nesting, but I think that the trade-off with reducing >> complexity is worth it. >> >> > >> >> > So in summary it could be: >> >> > PTransform.expand: (...) -> Union[PValue, NamedTuple[str, >> PCollection], Tuple[str, PCollection], Dict[str, PCollection], >> DoOutputsTuple] >> >> > >> >> > With the expectation that (pseudo-code): >> >> > a_transform = ATransform() >> >> > >> ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys() == >> a_transform.outputs.keys() >> >> > >> >> > Since this changes the Python SDK composite transform API, what >> would be the next steps for the community to come to a consensus on this? >> >> >> >> It seems here we're conflating the nesting of PValue results with the >> >> nesting of composite operations. >> >> >> >> Both examples in the original post have PTransform nesting (a >> >> composite) returning a flat tuple. This is completely orthogonal to >> >> the idea of a PTransform returning a nested result (such as (pc1, >> >> (pc2, pc3))) and forbidding the latter won't solve the former. >> >> >> >> Currently, with the exception of explicit names given for multi-output >> >> ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ... >> >> (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter >> >> the nesting. We could do better, e.g. for the example above, label >> >> them "0", "1.0", "1.1", or use the keys in the returned dict, but this >> >> is separate from the idea of trying to relate the output tags of >> >> composites to the output tags of their inner transforms. >> >> >> >> - Robert >> >