Re: [BEAM-9322] Python SDK discussion on correct output tag names

Sam Rohde Wed, 01 Apr 2020 12:42:24 -0700

So then how does the proposal sound?

Writing again here:
PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection],
Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple]


i.e. no arbitrary nesting when outputting from an expand

On Tue, Mar 31, 2020 at 5:15 PM Robert Bradshaw <rober...@google.com> wrote:

> On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik <lc...@google.com> wrote:
> >
> > It is important that composites know how things are named so that any
> embedded payloads in the composite PTransform can reference the outputs
> appropriately.
>
> Very good point. This is part of the cleanup to treat inputs and
> outputs of PCollections as maps rather than lists generally across the
> Python representations (which also relates to some of the ugliness
> that Cham has been running into with cross-language).
>
> > On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <rober...@google.com>
> wrote:
> >>
> >> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <sro...@google.com> wrote:
> >> >>>
> >> >>> * Don't allow arbitrary nestings returned during expansion, force
> composite transforms to always provide an unambiguous name (either a tuple
> with PCollections with unique tags or a dictionary with untagged
> PCollections or a singular PCollection (Java and Go SDKs do this)).
> >> >>
> >> >> I believe that aligning with Java and Go would be the right way to
> go here. I don't know if this would limit expressiveness.
> >> >
> >> > Yeah this sounds like a much more elegant way of handling this
> situation. I would lean towards this limiting expressiveness because there
> would be a limit to nesting, but I think that the trade-off with reducing
> complexity is worth it.
> >> >
> >> > So in summary it could be:
> >> > PTransform.expand: (...) -> Union[PValue, NamedTuple[str,
> PCollection], Tuple[str, PCollection], Dict[str, PCollection],
> DoOutputsTuple]
> >> >
> >> > With the expectation that (pseudo-code):
> >> > a_transform = ATransform()
> >> >
> ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys() ==
> a_transform.outputs.keys()
> >> >
> >> > Since this changes the Python SDK composite transform API, what would
> be the next steps for the community to come to a consensus on this?
> >>
> >> It seems here we're conflating the nesting of PValue results with the
> >> nesting of composite operations.
> >>
> >> Both examples in the original post have PTransform nesting (a
> >> composite) returning a flat tuple. This is completely orthogonal to
> >> the idea of a PTransform returning a nested result (such as (pc1,
> >> (pc2, pc3))) and forbidding the latter won't solve the former.
> >>
> >> Currently, with the exception of explicit names given for multi-output
> >> ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ...
> >> (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter
> >> the nesting. We could do better, e.g. for the example above, label
> >> them "0", "1.0", "1.1", or use the keys in the returned dict, but this
> >> is separate from the idea of trying to relate the output tags of
> >> composites to the output tags of their inner transforms.
> >>
> >> - Robert
>

Re: [BEAM-9322] Python SDK discussion on correct output tag names

Reply via email to