[
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303039#comment-14303039
]
Twinkle Sachdeva commented on SPARK-4705:
-----------------------------------------
Hi [~bcwalrus],
( please ignore above pull request as of now, as that is not complete )
This is the overall strategy, i am thinking of taking regarding this issue.
There will be two kinds of applications ( based on the cluster manager and mode
being used )
1. Applications which will have only on attempt possible
I am thinking of leaving the folder structure of event logs as well as
history server UI intact
2. Applications which will have more than one attempt tried
In this case, I am thinking of changing folder structure of event logs to be
inside <application_id>/<attempt_id> , so as to make sure that logDir is
different for each attempt, while keeping on application's all attempts logs
inside one directory.
Regarding History server UI, there will be two cases:
2.1 Application got succeeded in one attempt. Here we can keep the UI intact
from current, but this will make it different to look at if somebody is using
yarn-cluster and some applications got completed in multiple attempts.
2.2 Application got completed in more than one attempts. Here we can have
two options:
2.2.1 Here if somebody clicks at application id, then another page
gets loaded, which shows another table, which lists all the attempts of the
application. On clicking one of the attempts, we will show the UI as we show
today specific to that attempt.
2.2.2 Here if somebody clicks at application id, then on the same
page, we show some subtable or some kind of list, which has links for all
attempts. On clicking this attempt id link, we will show the UI as we show
today specific to that attempt.
In both of these two options, we will need to change the header to show
the attempt id value also.
Please provide your suggestions.
Thanks,
Twinkle
> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---------------------------------------------------------------------------
>
> Key: SPARK-4705
> URL: https://issues.apache.org/jira/browse/SPARK-4705
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, YARN
> Affects Versions: 1.2.0
> Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
> already exists!
> at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
> at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
> at
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app
> should clean up the old logs first.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]