Re: Discrete Transforms vs One Single transform

Luke Cwik Thu, 20 Feb 2020 22:09:23 -0800

Use discrete transforms.

If you merge them all into one transform you will lose visibility into the
different parts and will be rebuilding what already exists to provide that
visibility. You'll also be rebuilding that APIs that help users combine all
their functions together. You'll actually find that you'll be rebuilding
lots of what Apache Beam provides.


On Thu, Feb 20, 2020 at 8:19 PM amit kumar <[email protected]> wrote:

> Hi All,
>
> I am looking for inputs to understand the effects of converting multiple
> discrete transforms into one single transformation. (and performing all
> steps into one single PTransform).
>
> What is better approach, multiple discrete transforms vs one single
> transform with lambdas and multiple functions ?
>
> I wanted to understand the effect of combining multiple transforms into
> one single transform and doing everything in a lambda via Functions, will
> there be any affect in performance or debugging, metrics or any other
> factors and best practices?
>
> Version A
>     PCollection<MyType> myRecords = pbegin
>         .apply("Kinesis Source", readfromKinesis()) //transform1
>         .apply(MapElements
>             .into(TypeDescriptors.strings())
>             .via(record -> new String(record.getDataAsBytes())))
> //transform2
>         .apply(convertByteStringToJsonNode()) //transform3
>         .apply(schematizeElements()); //transform4
>
> Version B
>  PCollection<MyType> myRecords = pbegin
>         .apply("Kinesis Source", readfromKinesis()) transform1
>         .apply( inputKinesisRecord -> {
>         String record = inputKinesisRecord.getDataAsBytes();
>         JsonNode jsonNode = convertByteStringToJsonNode(record);
>             SchematizedElement outputElement =
> getSchematzedElement(jsonNode))
>             return outputElement;  }) transform2
>
>
> Thanks in advance!
> Amit
>

Re: Discrete Transforms vs One Single transform

Reply via email to