[
https://issues.apache.org/jira/browse/BEAM-13981?focusedWorklogId=740120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-740120
]
ASF GitHub Bot logged work on BEAM-13981:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Mar/22 15:58
Start Date: 11/Mar/22 15:58
Worklog Time Spent: 10m
Work Description: iemejia commented on pull request #17073:
URL: https://github.com/apache/beam/pull/17073#issuecomment-1065244745
I understood it more like a user friendliness issue but since this broke
metrics on Spark History Server I preferred to go straight to remove the
behavior, because history server is more important.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 740120)
Time Spent: 40m (was: 0.5h)
> Job event log empty on Spark History Server
> -------------------------------------------
>
> Key: BEAM-13981
> URL: https://issues.apache.org/jira/browse/BEAM-13981
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Affects Versions: 2.33.0
> Reporter: Jozef Vilcek
> Assignee: Ismaël Mejía
> Priority: P2
> Fix For: 2.38.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after
> complete shows empty data on History server.
> The problem seems to be a race condition and 2
> {noformat}
> 22/02/22 10:51:11 INFO EventLoggingListener: Logging events to
> hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
> ...
> 22/02/22 10:51:41 INFO EventLoggingListener: Logging events to
> hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1{noformat}
> At the end failure:
> {noformat}
> 22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping
> SchedulerExtensionServices
> (serviceOption=None,
> services=List(),
> started=false)
> 22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
> java.io.IOException: Target log file already exists
> (hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
> at
> org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
> at
> org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
> at
> org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
> at scala.Option.foreach(Option.scala:274)
> at
> org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
> at
> org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
> at
> org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
> at
> org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
> at
> org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
> at
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
> at
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
> at
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
> at scala.util.Try$.apply(Try.scala:213)
> at
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
> at
> com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
> at
> com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
> at
> com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684){noformat}
> This ends up with very empty file in HDFS and empty job details in history
> server.
>
> Problem seems to by introduced by this change:
> [https://github.com/apache/beam/pull/14409]
> Why Beam runs concurrent event listener to what spark is doing internally?
> When I roll back change for SparkRunner, problem disappear for me.
> I am running native SparkRunner with Spark 2.4.4
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)