[ 
https://issues.apache.org/jira/browse/BEAM-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809470#comment-16809470
 ] 

Kishan Kumar commented on BEAM-6846:
------------------------------------

Hi [~kenn] I have Added the Snippet Code for The Steps Where we found 
Performance Degradation and System Lag While Running Code in 2.7.0 and 2.8.0
 
*Note*: TableRow Clone is Also USed in DoFn's which I haven't Specified.
In DOFn We are Simple Using Lambda/Streaming for Finding Highest DateTime in 
Column for Final Decision.
 
*Snippet*
PCollection<KV<String,TableRow>> addLongUDTM = pBegin.apply("Read records from 
child table",
                
BigQueryIO.readTableRows().from(options.getChildTableDescription()).withoutValidation().withTemplateCompatibility())
                .apply("Add long UDTM",ParDo.of(new 
AddLongUDTM(childSDKey)).withSideInputs(childSDKey))
                .apply("Generate Merchant Key Value",ParDo.of(new 
GenerateMerchantKeyValue(childSDKey)).withSideInputs(childSDKey));              
      
        
        //merchant-wise max in KV<merchant,long>
        PCollection<KV<String,Long>> merchantWiseMaxUpdate = addLongUDTM
             .apply("Apply Group By Key",GroupByKey.<String, TableRow>create())
             .apply("Merchant-wise Greatest Update_DTM",ParDo.of(new 
CommonMerchantGreatest()));
        
        
        final TupleTag<TableRow> kvChildRows = new TupleTag<>();
        final TupleTag<Long> maxIndexMerchant = new TupleTag<>();
        
        //Merge collection values into a CoGbkResult collection
        PCollection<KV<String, CoGbkResult>> coGbkResultCollection =
                KeyedPCollectionTuple.of(kvChildRows, addLongUDTM)
         .and(maxIndexMerchant, merchantWiseMaxUpdate)
         .apply(CoGroupByKey.<String>create());

> Beam Performance Degradation on Dataflow Runner
> -----------------------------------------------
>
>                 Key: BEAM-6846
>                 URL: https://issues.apache.org/jira/browse/BEAM-6846
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.7.0, 2.8.0
>            Reporter: Kishan Kumar
>            Priority: Major
>              Labels: triaged
>
> After Upgrading From 2.5.0 Version the jobs of Version 2.7.0 are taking a 
> long time compared to 2.5.0.
> System Lag is also got introduced the Error Getting  Generated:
> "Workflow failed. Causes: The Dataflow job appears to be stuck because no 
> worker activity has been seen in the last 1h."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to