Jozef Vilcek created BEAM-13981:
-----------------------------------

             Summary: Job event log empty on Spark History Server
                 Key: BEAM-13981
                 URL: https://issues.apache.org/jira/browse/BEAM-13981
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
    Affects Versions: 2.33.0
            Reporter: Jozef Vilcek


After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after complete 
shows empty data on History server.

The problem seems to be a race condition and 2 
{noformat}
22/02/22 10:51:11 INFO EventLoggingListener: Logging events to 
hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
...
22/02/22 10:51:41 INFO EventLoggingListener: Logging events to 
hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1{noformat}
At the end failure:
{noformat}
22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping 
SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
java.io.IOException: Target log file already exists 
(hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
        at 
org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
        at 
org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
        at 
org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
        at scala.Option.foreach(Option.scala:274)
        at 
org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
        at 
org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
        at 
org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
        at 
org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
        at 
org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
        at scala.util.Try$.apply(Try.scala:213)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
        at 
com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
        at 
com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
        at 
com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684){noformat}
This ends up with very empty file in HDFS and empty job details in history 
server.

 

Problem seems to by introduced by this change:
[https://github.com/apache/beam/commit/9e0b378783ff824c90b17d2c082c0ffddef0d2a0]

Why Beam runs concurrent event listener to what spark is doing internally. When 
I roll back change for SparkRunner, problem disappear for me.

I am running native SparkRunner with Spark 2.4.4

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to