[
https://issues.apache.org/jira/browse/BEAM-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504987#comment-17504987
]
Ismaël Mejía commented on BEAM-13981:
-------------------------------------
[~ibzib] [~tomasz.szerszen] this PR essentially reverts BEAM-11213 because it
broke the event log on Spark History Server (and Hadoop). If you want to bring
the functionality back we have to pay attention to not start Event listeners on
our own to avoid the possible race conditions.
> Job event log empty on Spark History Server
> -------------------------------------------
>
> Key: BEAM-13981
> URL: https://issues.apache.org/jira/browse/BEAM-13981
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Affects Versions: 2.33.0
> Reporter: Jozef Vilcek
> Assignee: Ismaël Mejía
> Priority: P2
> Fix For: 2.38.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after
> complete shows empty data on History server.
> The problem seems to be a race condition and 2
> {noformat}
> 22/02/22 10:51:11 INFO EventLoggingListener: Logging events to
> hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
> ...
> 22/02/22 10:51:41 INFO EventLoggingListener: Logging events to
> hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1{noformat}
> At the end failure:
> {noformat}
> 22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping
> SchedulerExtensionServices
> (serviceOption=None,
> services=List(),
> started=false)
> 22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
> java.io.IOException: Target log file already exists
> (hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
> at
> org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
> at
> org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
> at
> org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
> at scala.Option.foreach(Option.scala:274)
> at
> org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
> at
> org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
> at
> org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
> at
> org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
> at
> org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
> at
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
> at
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
> at
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
> at scala.util.Try$.apply(Try.scala:213)
> at
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
> at
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
> at
> com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
> at
> com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684){noformat}
> This ends up with very empty file in HDFS and empty job details in history
> server.
>
> Problem seems to by introduced by this change:
> [https://github.com/apache/beam/pull/14409]
> Why Beam runs concurrent event listener to what spark is doing internally?
> When I roll back change for SparkRunner, problem disappear for me.
> I am running native SparkRunner with Spark 2.4.4
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)