[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

Twinkle Sachdeva (JIRA) Wed, 04 Feb 2015 06:05:25 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305086#comment-14305086
 ]


Twinkle Sachdeva commented on SPARK-4705:
-----------------------------------------

Hi [~vanzin],

Currently, inside the event log directory, a directory is created with 
application id, which contains following files:
APPLICATION_COMPLETE
EVENT_LOG_1 
SPARK_VERSION_1.2.0 ( for 1.2.0 version )

This is what I have planned ( and partially implemented )
 <eventlog_dir>/<application_id>/<attempt_id>/All the three files mentioned 
above for that specific attempt.

This will cause minimum noise with the current way of logging the events, as 
well as rendering the same too.
Please note that as of now, I am doing this change only for yarn-cluster mode. 
Though whole of it ( including UI ) can be availed by overriding 
applicationAttemptId() inside the SchedulerBackend implementation for that 
particular mode/ scheduler.

Regarding UI:
Showing multiple attempts in different subrows within the same page looks good 
to me too. There are two points regarding the same:
1. As of now, we don't show any status regarding Succeeded or failed, so 
probably, that can be taken later on. I hope, I am not missing something here.
2. As of now, stats are available for each attempt level ( stats includes: 
start time, end time, duration and last updated time ), should we aggregate 
some or all of these to be shown at application level, or should we just leave 
these stats blank for the main row?

As multiple attempts are specific to scheduler being used, if we just leave the 
current UI intact for those who don't have multiple attempts, that will leave 
their UI intact. In case of yarn cluster, we can show attempts in the sub rows, 
irrespective of the number of attempts tried, that will make it consistent. 

Please provide your suggestions. 

Just an update regarding the coding part : So far, i have implemented the 
folder structure and rendering of the same for multiple attempts separately. As 
of now, I am waiting to have the UI stuff to get finalised.

Thanks,
Twinkle

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-4705
>                 URL: https://issues.apache.org/jira/browse/SPARK-4705
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If 
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory 
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
>  already exists!
>         at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
>         at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
>         at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app 
> should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

Reply via email to