Hi All,

I am looking for inputs to understand the effects of converting multiple
discrete transforms into one single transformation. (and performing all
steps into one single PTransform).

What is better approach, multiple discrete transforms vs one single
transform with lambdas and multiple functions ?

I wanted to understand the effect of combining multiple transforms into one
single transform and doing everything in a lambda via Functions, will there
be any affect in performance or debugging, metrics or any other factors and
best practices?

Version A
    PCollection<MyType> myRecords = pbegin
        .apply("Kinesis Source", readfromKinesis()) //transform1
        .apply(MapElements
            .into(TypeDescriptors.strings())
            .via(record -> new String(record.getDataAsBytes())))
//transform2
        .apply(convertByteStringToJsonNode()) //transform3
        .apply(schematizeElements()); //transform4

Version B
 PCollection<MyType> myRecords = pbegin
        .apply("Kinesis Source", readfromKinesis()) transform1
        .apply( inputKinesisRecord -> {
        String record = inputKinesisRecord.getDataAsBytes();
        JsonNode jsonNode = convertByteStringToJsonNode(record);
            SchematizedElement outputElement =
getSchematzedElement(jsonNode))
            return outputElement;  }) transform2


Thanks in advance!
Amit

Reply via email to