Yes, as I understand it this is for (2). Imagine a use case in which I want to save some output. In order to make this atomic, the program uses part_[index]_[attempt].dat, and once it finishes writing, it renames this to part_[index].dat.
Right now [attempt] is just the TID, which could show up like (assuming this is not the first stage): part_0_1000 part_1_1001 part_0_1002 (some retry) ... This is fairly confusing. The natural thing to expect is part_0_0 part_1_0 part_0_1 ... On Mon, Oct 20, 2014 at 1:47 PM, Kay Ousterhout <k...@eecs.berkeley.edu> wrote: > Sorry to clarify, there are two issues here: > > (1) attemptId has different meanings in the codebase > (2) we currently don't propagate the 0-based per-task attempt identifier > to the executors. > > (1) should definitely be fixed. It sounds like Yin's original email was > requesting that we add (2). > > On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout <k...@eecs.berkeley.edu> > wrote: > >> Are you guys sure this is a bug? In the task scheduler, we keep two >> identifiers for each task: the "index", which uniquely identifiers the >> computation+partition, and the "taskId" which is unique across all tasks >> for that Spark context (See >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439). >> If multiple attempts of one task are run, they will have the same index, >> but different taskIds. Historically, we have used "taskId" and >> "taskAttemptId" interchangeably (which arose from naming in Mesos, which >> uses similar naming). >> >> This was complicated when Mr. Xin added the "attempt" field to TaskInfo, >> which we show in the UI. This field uniquely identifies attempts for a >> particular task, but is not unique across different task indexes (it always >> starts at 0 for a given task). I'm guessing the right fix is to rename >> Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does >> that sound right to you Reynold? >> >> -Kay >> >> On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >> >>> There is a deeper issue here which is AFAIK we don't even store a >>> notion of attempt inside of Spark, we just use a new taskId with the >>> same index. >>> >>> On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai <huaiyin....@gmail.com> >>> wrote: >>> > Yeah, seems we need to pass the attempt id to executors through >>> > TaskDescription. I have created >>> > https://issues.apache.org/jira/browse/SPARK-4014. >>> > >>> > On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> > >>> >> I also ran into this earlier. It is a bug. Do you want to file a jira? >>> >> >>> >> I think part of the problem is that we don't actually have the >>> attempt id >>> >> on the executors. If we do, that's great. If not, we'd need to >>> propagate >>> >> that over. >>> >> >>> >> On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai <huaiyin....@gmail.com> >>> wrote: >>> >> >>> >>> Hello, >>> >>> >>> >>> Is there any way to get the attempt number in a closure? Seems >>> >>> TaskContext.attemptId actually returns the taskId of a task (see this >>> >>> < >>> >>> >>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 >>> >>> > >>> >>> and this >>> >>> < >>> >>> >>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 >>> >>> >). >>> >>> It looks like a bug. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Yin >>> >>> >>> >> >>> >> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> For additional commands, e-mail: dev-h...@spark.apache.org >>> >>> >> >