[ https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305635#comment-14305635 ]
Marcelo Vanzin commented on SPARK-4705: --------------------------------------- Hi [~twinkle], bq. Please note that as of now, I am doing this change only for yarn-cluster mode Is there any limitation with other cluster managers that prevent you from also supporting them? I know I filed the bug with "yarn-cluster" in the summary, but standalone cluster most probably suffers from the same issue if you run with the "--supervise" flag. bq. leave the current UI intact for those who don't have multiple attempts I think that's good, but it doesn't require yarn-cluster-specific logic. All you need to check is whether some application has one attempt or multiple attemps, and render things slightly different. For example, with a single attempt: || App Id || App Name || Attempt Id || Started || ... || | app-1 | MyApp | | 201500204 | ...| With multiple attempts (sorry don't know how to do it in jira markup): {code} <table border="1"> <tr><th>App Id</th><th>App Name</th><th>Attempt Id</th><th>Started</th><th>...</th></tr> <tr><td rowspan="2">app-2</td><td rowspan="2">MyApp</td><td>2</td><td>201500205</td><td>...</td></tr> <tr><td>1</td><td>201500204</td><td>...</td></tr> </table> {code} (You can paste that at http://htmledit.squarefree.com/ to see it, or load it in your browser somehow.) You'd have that new "attempt id" column, but I think that's ok. We can look at exposing other things like the final status separately. > Driver retries in yarn-cluster mode always fail if event logging is enabled > --------------------------------------------------------------------------- > > Key: SPARK-4705 > URL: https://issues.apache.org/jira/browse/SPARK-4705 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN > Affects Versions: 1.2.0 > Reporter: Marcelo Vanzin > > yarn-cluster mode will retry to run the driver in certain failure modes. If > even logging is enabled, this will most probably fail, because: > {noformat} > Exception in thread "Driver" java.io.IOException: Log directory > hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003 > already exists! > at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) > at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) > at > org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:353) > {noformat} > The even log path should be "more unique". Or perhaps retries of the same app > should clean up the old logs first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org