Re: Get attempt number in a closure

Yin Huai Mon, 20 Oct 2014 14:15:07 -0700

Yes, it is for (2). I was confused because the doc of TaskContext.attemptId
(release 1.1)
<http://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.TaskContext>
is
"the number of attempts to execute this task". Seems the per-task attempt
id used to populate "attempt" field in the UI is maintained by
TaskSetManager and its value is assigned in resourceOffer.


On Mon, Oct 20, 2014 at 4:56 PM, Reynold Xin <r...@databricks.com> wrote:

> Yes, as I understand it this is for (2).
>
> Imagine a use case in which I want to save some output. In order to make
> this atomic, the program uses part_[index]_[attempt].dat, and once it
> finishes writing, it renames this to part_[index].dat.
>
> Right now [attempt] is just the TID, which could show up like (assuming
> this is not the first stage):
>
> part_0_1000
> part_1_1001
> part_0_1002 (some retry)
> ...
>
> This is fairly confusing. The natural thing to expect is
>
> part_0_0
> part_1_0
> part_0_1
> ...
>
>
>
> On Mon, Oct 20, 2014 at 1:47 PM, Kay Ousterhout <k...@eecs.berkeley.edu>
> wrote:
>
>> Sorry to clarify, there are two issues here:
>>
>> (1) attemptId has different meanings in the codebase
>> (2) we currently don't propagate the 0-based per-task attempt identifier
>> to the executors.
>>
>> (1) should definitely be fixed.  It sounds like Yin's original email was
>> requesting that we add (2).
>>
>> On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout <k...@eecs.berkeley.edu>
>> wrote:
>>
>>> Are you guys sure this is a bug?  In the task scheduler, we keep two
>>> identifiers for each task: the "index", which uniquely identifiers the
>>> computation+partition, and the "taskId" which is unique across all tasks
>>> for that Spark context (See
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439).
>>> If multiple attempts of one task are run, they will have the same index,
>>> but different taskIds.  Historically, we have used "taskId" and
>>> "taskAttemptId" interchangeably (which arose from naming in Mesos, which
>>> uses similar naming).
>>>
>>> This was complicated when Mr. Xin added the "attempt" field to TaskInfo,
>>> which we show in the UI.  This field uniquely identifies attempts for a
>>> particular task, but is not unique across different task indexes (it always
>>> starts at 0 for a given task).  I'm guessing the right fix is to rename
>>> Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does
>>> that sound right to you Reynold?
>>>
>>> -Kay
>>>
>>> On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell <pwend...@gmail.com>
>>> wrote:
>>>
>>>> There is a deeper issue here which is AFAIK we don't even store a
>>>> notion of attempt inside of Spark, we just use a new taskId with the
>>>> same index.
>>>>
>>>> On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai <huaiyin....@gmail.com>
>>>> wrote:
>>>> > Yeah, seems we need to pass the attempt id to executors through
>>>> > TaskDescription. I have created
>>>> > https://issues.apache.org/jira/browse/SPARK-4014.
>>>> >
>>>> > On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin <r...@databricks.com>
>>>> wrote:
>>>> >
>>>> >> I also ran into this earlier. It is a bug. Do you want to file a
>>>> jira?
>>>> >>
>>>> >> I think part of the problem is that we don't actually have the
>>>> attempt id
>>>> >> on the executors. If we do, that's great. If not, we'd need to
>>>> propagate
>>>> >> that over.
>>>> >>
>>>> >> On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai <huaiyin....@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> Hello,
>>>> >>>
>>>> >>> Is there any way to get the attempt number in a closure? Seems
>>>> >>> TaskContext.attemptId actually returns the taskId of a task (see
>>>> this
>>>> >>> <
>>>> >>>
>>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
>>>> >>> >
>>>> >>>  and this
>>>> >>> <
>>>> >>>
>>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
>>>> >>> >).
>>>> >>> It looks like a bug.
>>>> >>>
>>>> >>> Thanks,
>>>> >>>
>>>> >>> Yin
>>>> >>>
>>>> >>
>>>> >>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: Get attempt number in a closure

Reply via email to