Yes, as I understand it this is for (2).

Imagine a use case in which I want to save some output. In order to make
this atomic, the program uses part_[index]_[attempt].dat, and once it
finishes writing, it renames this to part_[index].dat.

Right now [attempt] is just the TID, which could show up like (assuming
this is not the first stage):

part_0_1000
part_1_1001
part_0_1002 (some retry)
...

This is fairly confusing. The natural thing to expect is

part_0_0
part_1_0
part_0_1
...



On Mon, Oct 20, 2014 at 1:47 PM, Kay Ousterhout <k...@eecs.berkeley.edu>
wrote:

> Sorry to clarify, there are two issues here:
>
> (1) attemptId has different meanings in the codebase
> (2) we currently don't propagate the 0-based per-task attempt identifier
> to the executors.
>
> (1) should definitely be fixed.  It sounds like Yin's original email was
> requesting that we add (2).
>
> On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout <k...@eecs.berkeley.edu>
> wrote:
>
>> Are you guys sure this is a bug?  In the task scheduler, we keep two
>> identifiers for each task: the "index", which uniquely identifiers the
>> computation+partition, and the "taskId" which is unique across all tasks
>> for that Spark context (See
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439).
>> If multiple attempts of one task are run, they will have the same index,
>> but different taskIds.  Historically, we have used "taskId" and
>> "taskAttemptId" interchangeably (which arose from naming in Mesos, which
>> uses similar naming).
>>
>> This was complicated when Mr. Xin added the "attempt" field to TaskInfo,
>> which we show in the UI.  This field uniquely identifies attempts for a
>> particular task, but is not unique across different task indexes (it always
>> starts at 0 for a given task).  I'm guessing the right fix is to rename
>> Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does
>> that sound right to you Reynold?
>>
>> -Kay
>>
>> On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell <pwend...@gmail.com>
>> wrote:
>>
>>> There is a deeper issue here which is AFAIK we don't even store a
>>> notion of attempt inside of Spark, we just use a new taskId with the
>>> same index.
>>>
>>> On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai <huaiyin....@gmail.com>
>>> wrote:
>>> > Yeah, seems we need to pass the attempt id to executors through
>>> > TaskDescription. I have created
>>> > https://issues.apache.org/jira/browse/SPARK-4014.
>>> >
>>> > On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>> >
>>> >> I also ran into this earlier. It is a bug. Do you want to file a jira?
>>> >>
>>> >> I think part of the problem is that we don't actually have the
>>> attempt id
>>> >> on the executors. If we do, that's great. If not, we'd need to
>>> propagate
>>> >> that over.
>>> >>
>>> >> On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai <huaiyin....@gmail.com>
>>> wrote:
>>> >>
>>> >>> Hello,
>>> >>>
>>> >>> Is there any way to get the attempt number in a closure? Seems
>>> >>> TaskContext.attemptId actually returns the taskId of a task (see this
>>> >>> <
>>> >>>
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
>>> >>> >
>>> >>>  and this
>>> >>> <
>>> >>>
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
>>> >>> >).
>>> >>> It looks like a bug.
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>> Yin
>>> >>>
>>> >>
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>

Reply via email to