[ 
https://issues.apache.org/jira/browse/BEAM-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jozef Vilcek updated BEAM-13981:
--------------------------------
    Description: 
After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after complete 
shows empty data on History server.

The problem seems to be a race condition and 2 
{noformat}
22/02/22 10:51:11 INFO EventLoggingListener: Logging events to 
hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
...
22/02/22 10:51:41 INFO EventLoggingListener: Logging events to 
hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1{noformat}
At the end failure:
{noformat}
22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping 
SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
java.io.IOException: Target log file already exists 
(hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
        at 
org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
        at 
org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
        at 
org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
        at scala.Option.foreach(Option.scala:274)
        at 
org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
        at 
org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
        at 
org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
        at 
org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
        at scala.util.Try$.apply(Try.scala:213)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
        at 
com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
        at 
com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684){noformat}
This ends up with very empty file in HDFS and empty job details in history 
server.

 

Problem seems to by introduced by this change:

[https://github.com/apache/beam/pull/14409]

Why Beam runs concurrent event listener to what spark is doing internally? When 
I roll back change for SparkRunner, problem disappear for me.

I am running native SparkRunner with Spark 2.4.4

 

  was:
After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after complete 
shows empty data on History server.

The problem seems to be a race condition and 2 
{noformat}
22/02/22 10:51:11 INFO EventLoggingListener: Logging events to 
hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
...
22/02/22 10:51:41 INFO EventLoggingListener: Logging events to 
hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1{noformat}
At the end failure:
{noformat}
22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping 
SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
java.io.IOException: Target log file already exists 
(hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
        at 
org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
        at 
org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
        at 
org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
        at scala.Option.foreach(Option.scala:274)
        at 
org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
        at 
org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
        at 
org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
        at 
org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
        at scala.util.Try$.apply(Try.scala:213)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
        at 
com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
        at 
com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684){noformat}
This ends up with very empty file in HDFS and empty job details in history 
server.

 

Problem seems to by introduced by this change:

[https://github.com/apache/beam/pull/14409]

Why Beam runs concurrent event listener to what spark is doing internally. When 
I roll back change for SparkRunner, problem disappear for me.

I am running native SparkRunner with Spark 2.4.4

 


> Job event log empty on Spark History Server
> -------------------------------------------
>
>                 Key: BEAM-13981
>                 URL: https://issues.apache.org/jira/browse/BEAM-13981
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>    Affects Versions: 2.33.0
>            Reporter: Jozef Vilcek
>            Priority: P2
>
> After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after 
> complete shows empty data on History server.
> The problem seems to be a race condition and 2 
> {noformat}
> 22/02/22 10:51:11 INFO EventLoggingListener: Logging events to 
> hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
> ...
> 22/02/22 10:51:41 INFO EventLoggingListener: Logging events to 
> hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1{noformat}
> At the end failure:
> {noformat}
> 22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping 
> SchedulerExtensionServices
> (serviceOption=None,
>  services=List(),
>  started=false)
> 22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
> java.io.IOException: Target log file already exists 
> (hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
>       at 
> org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
>       at 
> org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
>       at 
> org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
>       at scala.Option.foreach(Option.scala:274)
>       at 
> org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
>       at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>       at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
>       at 
> org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
>       at 
> org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
>       at 
> org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
>       at 
> org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
>       at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
>       at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
>       at 
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
>       at scala.util.Try$.apply(Try.scala:213)
>       at 
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
>       at 
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
>       at 
> com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
>       at 
> com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684){noformat}
> This ends up with very empty file in HDFS and empty job details in history 
> server.
>  
> Problem seems to by introduced by this change:
> [https://github.com/apache/beam/pull/14409]
> Why Beam runs concurrent event listener to what spark is doing internally? 
> When I roll back change for SparkRunner, problem disappear for me.
> I am running native SparkRunner with Spark 2.4.4
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to