[ https://issues.apache.org/jira/browse/SPARK-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739615#comment-14739615 ]
Ryan Williams commented on SPARK-10551: --------------------------------------- Here is the full event log: https://www.dropbox.com/s/2633xxc3wi4v3qp/application_1439224376754_5702?dl=0 > Successful task-end event after task failed due to executor loss > ---------------------------------------------------------------- > > Key: SPARK-10551 > URL: https://issues.apache.org/jira/browse/SPARK-10551 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.4.1 > Reporter: Ryan Williams > > Doing forensics on some failed Spark applications and seeing nonsensical > things in the event logs, e.g.: > {code} > $ grep -n '"Task ID":12083' application_1439224376754_5702 > 24578:{"Event":"SparkListenerTaskStart","Stage ID":6,"Stage Attempt > ID":0,"Task Info":{"Task ID":12083,"Index":145,"Attempt":0,"Launch > Time":1440703704768,"Executor > ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting > Result Time":0,"Finish Time":0,"Failed":false,"Accumulables":[]}} > 28918:{"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task > Type":"ShuffleMapTask","Task End > Reason":{"Reason":"ExecutorLostFailure","Executor ID":"232"},"Task > Info":{"Task ID":12083,"Index":145,"Attempt":0,"Launch > Time":1440703704768,"Executor > ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting > Result Time":0,"Finish Time":1440703707747,"Failed":true,"Accumulables":[]}} > 29062:{"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task > Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task > Info":{"Task ID":12083,"Index":145,"Attempt":0,"Launch > Time":1440703704768,"Executor > ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting > Result Time":0,"Finish > Time":1440703707747,"Failed":true,"Accumulables":[]},"Task Metrics":{"Host > Name":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Executor Deserialize > Time":181,"Executor Run Time":1585,"Result Size":8760,"JVM GC Time":0,"Result > Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes > Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":454121,"Shuffle > Write Time":43293396,"Shuffle Records Written":2549},"Input Metrics":{"Data > Read Method":"Memory","Bytes Read":810520,"Records Read":2549}}} > {code} > Task ID 12083 has a TaskStart event, a TaskEnd event indicating that the task > failed due to {{ExecutorLostFailure}}, and then a TaskEnd event saying that > the task succeeded. > The history server is not showing me this file in the "complete" or > "incomplete" sections, though it has this line in its stdout (and no apparent > exceptions later), which I thought meant that it parsed the file correctly: > {code} > 15/09/10 17:57:56 INFO FsHistoryProvider: Replaying log path: > hdfs://demeter-nn1.demeter.hpc.mssm.edu/spark/tmp/logs/willir31/application_1439224376754_5702 > {code} > [~arahuja] ran this application originally and says that the live web UI was > showing inconsistent/nonsensical data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org