Re: [Proposal] Add a static PTransform.compose() method for composing transforms in a lambda expression

Jeff Klukas Wed, 19 Sep 2018 14:19:26 -0700

Thanks for the thoughts, Lukasz. These are exactly the kinds of issues I
was hoping to get context on, since I don't yet have extensive experience
with beam.


I have not yet run into issues where the output coder was not able to be
inferred. I expect this may be a non-issue, as the individual transforms
used within a user-provided lambda expression would presumably expose the
ability to specify a coder.

I don't have enough context yet to comment on whether display data might be
an issue, so I do hope the user list can provide input there.

On Wed, Sep 19, 2018 at 4:52 PM Lukasz Cwik <[email protected]> wrote:

> Thanks for the proposal and it does seem to make the API cleaner to build
> anonymous composite transforms.
>
> In your experience have you had issues where the API doesn't work out well
> because the PTransform:
> * is not able to override how the output coder is inferred?
> * can't supply display data?
>
> [email protected] <[email protected]>, do users think that the
> provided API would be useful enough for it to be added to the core SDK or
> would the addition of the method provide noise/detract from the existing
> API?
>
> On Mon, Sep 17, 2018 at 12:57 PM Jeff Klukas <[email protected]> wrote:
>
>> I've gone ahead and filed a JIRA Issue and GitHub PR to follow up on this
>> suggestion and make it more concrete:
>>
>> https://issues.apache.org/jira/browse/BEAM-5413
>> https://github.com/apache/beam/pull/6414
>>
>> On Fri, Sep 14, 2018 at 1:42 PM Jeff Klukas <[email protected]> wrote:
>>
>>> Hello all, I'm a data engineer at Mozilla working on a first project
>>> using Beam. I've been impressed with the usability of the API as there are
>>> good built-in solutions for handling many simple transformation cases with
>>> minimal code, and wanted to discuss one bit of ergonomics that seems to be
>>> missing.
>>>
>>> It appears that none of the existing PTransform factories are generic
>>> enough to take in or output a PCollectionTuple, but we've found many use
>>> cases where it's convenient to apply a few transforms on a PCollectionTuple
>>> in a lambda expression.
>>>
>>> For example, we've defined several PTransforms that return main and
>>> error output stream bundled in a PCollectionTuple. We defined a
>>> CompositeTransform interface so that we could handle the error output in a
>>> lambda expression like:
>>>
>>> pipeline
>>>     .apply("attempt to deserialize messages", new
>>> MyDeserializationTransform())
>>>     .apply("write deserialization errors",
>>>         CompositeTransform.of((PCollectionTuple input) -> {
>>>             input.get(errorTag).apply(new MyErrorOutputTransform())
>>>             return input.get(mainTag);
>>>         })
>>>     .apply("more processing on the deserialized messages", new
>>> MyOtherTransform())
>>>
>>> I'd be interested in contributing a patch to add this functionality,
>>> perhaps as a static method PTransform.compose(). Would that patch be
>>> welcome? Are there other thoughts on naming?
>>>
>>> The full code of the CompositeTransform interface we're currently using
>>> is included below.
>>>
>>>
>>> public interface CompositeTransform<InputT extends PInput, OutputT
>>> extends POutput> {
>>>   OutputT expand(InputT input);
>>>
>>>   /**
>>>    * The public factory method that serves as the entrypoint for users
>>> to create a composite PTransform.
>>>    */
>>>   static <InputT extends PInput, OutputT extends POutput>
>>>         PTransform<InputT, OutputT> of(CompositeTransform<InputT,
>>> OutputT> transform) {
>>>     return new PTransform<InputT, OutputT>() {
>>>       @Override
>>>       public OutputT expand(InputT input) {
>>>         return transform.expand(input);
>>>       }
>>>     };
>>>   }
>>> }
>>>
>>>
>>>
>>>

Re: [Proposal] Add a static PTransform.compose() method for composing transforms in a lambda expression

Reply via email to