[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

Twinkle Sachdeva (JIRA) Tue, 03 Feb 2015 02:13:55 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303039#comment-14303039
 ]


Twinkle Sachdeva commented on SPARK-4705:
-----------------------------------------

Hi [~bcwalrus],

( please ignore above pull request as of now, as that is not complete )

This is the overall strategy, i am thinking of taking regarding this issue.

There will be two kinds of applications ( based on the cluster manager and mode 
being used )
1. Applications which will have only on attempt possible 
    I am thinking of leaving the folder structure of event logs  as well as 
history server UI intact
2. Applications which will have more than one attempt tried
   In this case, I am thinking of changing folder structure of event logs to be 
inside <application_id>/<attempt_id> , so as to make sure that logDir is 
different for each attempt, while keeping on application's all attempts logs 
inside one directory.
  Regarding History server UI, there will be two cases:
   2.1 Application got succeeded in one attempt. Here we can keep the UI intact 
from current, but this will make it different to look at if somebody is using 
yarn-cluster and some applications got completed in multiple attempts.
   2.2 Application got completed in more than one attempts. Here we can have 
two options:
           2.2.1 Here if somebody clicks at application id, then another page 
gets loaded, which shows another table, which lists all the attempts of the 
application. On clicking one of the attempts, we will show the UI as we show 
today specific to that attempt.
           2.2.2 Here if somebody clicks at application id, then on the same 
page, we show some subtable or some kind of list, which has links for all 
attempts. On clicking this attempt id link, we will show the UI as we show 
today specific to that attempt.
      In both of these two options, we will need to change the header to show 
the attempt id value also.

Please provide your suggestions.

Thanks,
Twinkle

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-4705
>                 URL: https://issues.apache.org/jira/browse/SPARK-4705
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If 
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory 
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
>  already exists!
>         at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
>         at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
>         at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app 
> should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

Reply via email to