[
https://issues.apache.org/jira/browse/BEAM-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809470#comment-16809470
]
Kishan Kumar commented on BEAM-6846:
------------------------------------
Hi [~kenn] I have Added the Snippet Code for The Steps Where we found
Performance Degradation and System Lag While Running Code in 2.7.0 and 2.8.0
*Note*: TableRow Clone is Also USed in DoFn's which I haven't Specified.
In DOFn We are Simple Using Lambda/Streaming for Finding Highest DateTime in
Column for Final Decision.
*Snippet*
PCollection<KV<String,TableRow>> addLongUDTM = pBegin.apply("Read records from
child table",
BigQueryIO.readTableRows().from(options.getChildTableDescription()).withoutValidation().withTemplateCompatibility())
.apply("Add long UDTM",ParDo.of(new
AddLongUDTM(childSDKey)).withSideInputs(childSDKey))
.apply("Generate Merchant Key Value",ParDo.of(new
GenerateMerchantKeyValue(childSDKey)).withSideInputs(childSDKey));
//merchant-wise max in KV<merchant,long>
PCollection<KV<String,Long>> merchantWiseMaxUpdate = addLongUDTM
.apply("Apply Group By Key",GroupByKey.<String, TableRow>create())
.apply("Merchant-wise Greatest Update_DTM",ParDo.of(new
CommonMerchantGreatest()));
final TupleTag<TableRow> kvChildRows = new TupleTag<>();
final TupleTag<Long> maxIndexMerchant = new TupleTag<>();
//Merge collection values into a CoGbkResult collection
PCollection<KV<String, CoGbkResult>> coGbkResultCollection =
KeyedPCollectionTuple.of(kvChildRows, addLongUDTM)
.and(maxIndexMerchant, merchantWiseMaxUpdate)
.apply(CoGroupByKey.<String>create());
> Beam Performance Degradation on Dataflow Runner
> -----------------------------------------------
>
> Key: BEAM-6846
> URL: https://issues.apache.org/jira/browse/BEAM-6846
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 2.7.0, 2.8.0
> Reporter: Kishan Kumar
> Priority: Major
> Labels: triaged
>
> After Upgrading From 2.5.0 Version the jobs of Version 2.7.0 are taking a
> long time compared to 2.5.0.
> System Lag is also got introduced the Error Getting Generated:
> "Workflow failed. Causes: The Dataflow job appears to be stuck because no
> worker activity has been seen in the last 1h."
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)