Re: [BEAM-9322] Python SDK discussion on correct output tag names

Robert Bradshaw Wed, 01 Apr 2020 12:45:23 -0700

I'm -1 on this, it adds additional restrictions and I don't see what this
buys us. (In particular, it doesn't address the original issue.)


On Wed, Apr 1, 2020 at 12:41 PM Sam Rohde <sro...@google.com> wrote:

> So then how does the proposal sound?
>
> Writing again here:
> PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection],
> Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple]
>
> i.e. no arbitrary nesting when outputting from an expand
>
> On Tue, Mar 31, 2020 at 5:15 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik <lc...@google.com> wrote:
>> >
>> > It is important that composites know how things are named so that any
>> embedded payloads in the composite PTransform can reference the outputs
>> appropriately.
>>
>> Very good point. This is part of the cleanup to treat inputs and
>> outputs of PCollections as maps rather than lists generally across the
>> Python representations (which also relates to some of the ugliness
>> that Cham has been running into with cross-language).
>>
>> > On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <rober...@google.com>
>> wrote:
>> >>
>> >> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <sro...@google.com> wrote:
>> >> >>>
>> >> >>> * Don't allow arbitrary nestings returned during expansion, force
>> composite transforms to always provide an unambiguous name (either a tuple
>> with PCollections with unique tags or a dictionary with untagged
>> PCollections or a singular PCollection (Java and Go SDKs do this)).
>> >> >>
>> >> >> I believe that aligning with Java and Go would be the right way to
>> go here. I don't know if this would limit expressiveness.
>> >> >
>> >> > Yeah this sounds like a much more elegant way of handling this
>> situation. I would lean towards this limiting expressiveness because there
>> would be a limit to nesting, but I think that the trade-off with reducing
>> complexity is worth it.
>> >> >
>> >> > So in summary it could be:
>> >> > PTransform.expand: (...) -> Union[PValue, NamedTuple[str,
>> PCollection], Tuple[str, PCollection], Dict[str, PCollection],
>> DoOutputsTuple]
>> >> >
>> >> > With the expectation that (pseudo-code):
>> >> > a_transform = ATransform()
>> >> >
>> ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys() ==
>> a_transform.outputs.keys()
>> >> >
>> >> > Since this changes the Python SDK composite transform API, what
>> would be the next steps for the community to come to a consensus on this?
>> >>
>> >> It seems here we're conflating the nesting of PValue results with the
>> >> nesting of composite operations.
>> >>
>> >> Both examples in the original post have PTransform nesting (a
>> >> composite) returning a flat tuple. This is completely orthogonal to
>> >> the idea of a PTransform returning a nested result (such as (pc1,
>> >> (pc2, pc3))) and forbidding the latter won't solve the former.
>> >>
>> >> Currently, with the exception of explicit names given for multi-output
>> >> ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ...
>> >> (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter
>> >> the nesting. We could do better, e.g. for the example above, label
>> >> them "0", "1.0", "1.1", or use the keys in the returned dict, but this
>> >> is separate from the idea of trying to relate the output tags of
>> >> composites to the output tags of their inner transforms.
>> >>
>> >> - Robert
>>
>

Re: [BEAM-9322] Python SDK discussion on correct output tag names

Reply via email to