[jira] [Created] (BEAM-8838) Apache Beam Metrics Counter giving incorrect count using SparkRunner

kunal (Jira) Tue, 26 Nov 2019 22:27:59 -0800

kunal created BEAM-8838:
---------------------------

             Summary: Apache Beam Metrics Counter giving incorrect count using 
SparkRunner
                 Key: BEAM-8838
                 URL: https://issues.apache.org/jira/browse/BEAM-8838
             Project: Beam
          Issue Type: Bug
          Components: community-metrics
    Affects Versions: 2.16.0, 2.14.0, 2.13.0
         Environment: Cloudera Express 6.2.0
Java Version: 1.8.0_181
Spark 2.4.0-cdh6.2.0
1 Master Node and 3 Data node(64 cores, 128GB RAM)
--driver-memory "2g"  --num-executors "6" --executor-cores "3"
            Reporter: kunal



I am having source and target csv files with 10 million records and 250 
columns. I am running an apache beam pipeline which joins all columns from 
source and target file. When I run this on spark cluster the pipeline executes 
correctly with no exceptions but, The join beam metrics counter returns double 
count when the following spark property is used. -- executor-memory "2g" But, 
When I increase the excutor-memory to 11g then it returns the correct count.
Count doubles only when I dump the results to file but if I don't dump then 
counts are correct.


Note : 
[https://stackoverflow.com/questions/59032734/apache-beam-metrics-counter-giving-incorrect-count-using-sparkrunner?noredirect=1#comment104344657_59032734]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8838) Apache Beam Metrics Counter giving incorrect count using SparkRunner

Reply via email to