Re: Get attempt number in a closure

Kay Ousterhout Mon, 20 Oct 2014 13:46:30 -0700

Are you guys sure this is a bug?  In the task scheduler, we keep two
identifiers for each task: the "index", which uniquely identifiers the
computation+partition, and the "taskId" which is unique across all tasks
for that Spark context (See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439).
If multiple attempts of one task are run, they will have the same index,
but different taskIds.  Historically, we have used "taskId" and
"taskAttemptId" interchangeably (which arose from naming in Mesos, which
uses similar naming).


This was complicated when Mr. Xin added the "attempt" field to TaskInfo,
which we show in the UI.  This field uniquely identifies attempts for a
particular task, but is not unique across different task indexes (it always
starts at 0 for a given task).  I'm guessing the right fix is to rename
Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does
that sound right to you Reynold?

-Kay

On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> There is a deeper issue here which is AFAIK we don't even store a
> notion of attempt inside of Spark, we just use a new taskId with the
> same index.
>
> On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai <huaiyin....@gmail.com> wrote:
> > Yeah, seems we need to pass the attempt id to executors through
> > TaskDescription. I have created
> > https://issues.apache.org/jira/browse/SPARK-4014.
> >
> > On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin <r...@databricks.com>
> wrote:
> >
> >> I also ran into this earlier. It is a bug. Do you want to file a jira?
> >>
> >> I think part of the problem is that we don't actually have the attempt
> id
> >> on the executors. If we do, that's great. If not, we'd need to propagate
> >> that over.
> >>
> >> On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai <huaiyin....@gmail.com>
> wrote:
> >>
> >>> Hello,
> >>>
> >>> Is there any way to get the attempt number in a closure? Seems
> >>> TaskContext.attemptId actually returns the taskId of a task (see this
> >>> <
> >>>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
> >>> >
> >>>  and this
> >>> <
> >>>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
> >>> >).
> >>> It looks like a bug.
> >>>
> >>> Thanks,
> >>>
> >>> Yin
> >>>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: Get attempt number in a closure

Reply via email to