Use discrete transforms. If you merge them all into one transform you will lose visibility into the different parts and will be rebuilding what already exists to provide that visibility. You'll also be rebuilding that APIs that help users combine all their functions together. You'll actually find that you'll be rebuilding lots of what Apache Beam provides.
On Thu, Feb 20, 2020 at 8:19 PM amit kumar <[email protected]> wrote: > Hi All, > > I am looking for inputs to understand the effects of converting multiple > discrete transforms into one single transformation. (and performing all > steps into one single PTransform). > > What is better approach, multiple discrete transforms vs one single > transform with lambdas and multiple functions ? > > I wanted to understand the effect of combining multiple transforms into > one single transform and doing everything in a lambda via Functions, will > there be any affect in performance or debugging, metrics or any other > factors and best practices? > > Version A > PCollection<MyType> myRecords = pbegin > .apply("Kinesis Source", readfromKinesis()) //transform1 > .apply(MapElements > .into(TypeDescriptors.strings()) > .via(record -> new String(record.getDataAsBytes()))) > //transform2 > .apply(convertByteStringToJsonNode()) //transform3 > .apply(schematizeElements()); //transform4 > > Version B > PCollection<MyType> myRecords = pbegin > .apply("Kinesis Source", readfromKinesis()) transform1 > .apply( inputKinesisRecord -> { > String record = inputKinesisRecord.getDataAsBytes(); > JsonNode jsonNode = convertByteStringToJsonNode(record); > SchematizedElement outputElement = > getSchematzedElement(jsonNode)) > return outputElement; }) transform2 > > > Thanks in advance! > Amit >
