[ https://issues.apache.org/jira/browse/BEAM-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kunal updated BEAM-8838: ------------------------ Component/s: (was: community-metrics) runner-spark > Apache Beam Metrics Counter giving incorrect count using SparkRunner > -------------------------------------------------------------------- > > Key: BEAM-8838 > URL: https://issues.apache.org/jira/browse/BEAM-8838 > Project: Beam > Issue Type: Bug > Components: runner-spark > Affects Versions: 2.13.0, 2.14.0, 2.16.0 > Environment: Cloudera Express 6.2.0 > Java Version: 1.8.0_181 > Spark 2.4.0-cdh6.2.0 > 1 Master Node and 3 Data node(64 cores, 128GB RAM) > --driver-memory "2g" --num-executors "6" --executor-cores "3" > Reporter: kunal > Priority: Major > > I am having source and target csv files with 10 million records and 250 > columns. I am running an apache beam pipeline which joins all columns from > source and target file. When I run this on spark cluster the pipeline > executes correctly with no exceptions but, The join beam metrics counter > returns double count when the following spark property is used. -- > executor-memory "2g" But, When I increase the excutor-memory to 11g then it > returns the correct count. > Count doubles only when I dump the results to file but if I don't dump then > counts are correct. > Note : > [https://stackoverflow.com/questions/59032734/apache-beam-metrics-counter-giving-incorrect-count-using-sparkrunner?noredirect=1#comment104344657_59032734] -- This message was sent by Atlassian Jira (v8.3.4#803005)