[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

Marcelo Vanzin (JIRA) Wed, 04 Feb 2015 10:10:16 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305635#comment-14305635
 ]


Marcelo Vanzin commented on SPARK-4705:
---------------------------------------

Hi [~twinkle],

bq. Please note that as of now, I am doing this change only for yarn-cluster 
mode

Is there any limitation with other cluster managers that prevent you from also 
supporting them? I know I filed the bug with "yarn-cluster" in the summary, but 
standalone cluster most probably suffers from the same issue if you run with 
the "--supervise" flag.

bq.  leave the current UI intact for those who don't have multiple attempts

I think that's good, but it doesn't require yarn-cluster-specific logic. All 
you need to check is whether some application has one attempt or multiple 
attemps, and render things slightly different. For example, with a single 
attempt:

|| App Id || App Name || Attempt Id || Started || ... ||
| app-1 | MyApp | | 201500204 | ...|

With multiple attempts (sorry don't know how to do it in jira markup):

{code}
<table border="1">
  <tr><th>App Id</th><th>App Name</th><th>Attempt 
Id</th><th>Started</th><th>...</th></tr>
  <tr><td rowspan="2">app-2</td><td 
rowspan="2">MyApp</td><td>2</td><td>201500205</td><td>...</td></tr>
  <tr><td>1</td><td>201500204</td><td>...</td></tr>
</table>
{code}

(You can paste that at http://htmledit.squarefree.com/ to see it, or load it in 
your browser somehow.)

You'd have that new "attempt id" column, but I think that's ok. We can look at 
exposing other things like the final status separately.

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-4705
>                 URL: https://issues.apache.org/jira/browse/SPARK-4705
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If 
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory 
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
>  already exists!
>         at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
>         at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
>         at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app 
> should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

Reply via email to