[ 
https://issues.apache.org/jira/browse/BEAM-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-8838:
--------------------------------
    Labels: stale-P2  (was: )

> Apache Beam Metrics Counter giving incorrect count using SparkRunner
> --------------------------------------------------------------------
>
>                 Key: BEAM-8838
>                 URL: https://issues.apache.org/jira/browse/BEAM-8838
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>    Affects Versions: 2.13.0, 2.14.0, 2.16.0
>         Environment: Cloudera Express 6.2.0
> Java Version: 1.8.0_181
> Spark 2.4.0-cdh6.2.0
> 1 Master Node and 3 Data node(64 cores, 128GB RAM)
> --driver-memory "2g"  --num-executors "6" --executor-cores "3"
>            Reporter: kunal
>            Priority: P2
>              Labels: stale-P2
>
> I am having source and target csv files with 10 million records and 250 
> columns. I am running an apache beam pipeline which joins all columns from 
> source and target file. When I run this on spark cluster the pipeline 
> executes correctly with no exceptions but, The join beam metrics counter 
> returns double count when the following spark property is used. -- 
> executor-memory "2g" But, When I increase the excutor-memory to 11g then it 
> returns the correct count.
> Count doubles only when I dump the results to file but if I don't dump then 
> counts are correct.
> Note : 
> [https://stackoverflow.com/questions/59032734/apache-beam-metrics-counter-giving-incorrect-count-using-sparkrunner?noredirect=1#comment104344657_59032734]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to