[ 
https://issues.apache.org/jira/browse/TEZ-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520693#comment-14520693
 ] 

Jeff Zhang commented on TEZ-1521:
---------------------------------

[~hitesh] The root_input_data_information_event may be routed twice. When it is 
first routed, the vertex has not been scheduled, so will be put in the pending 
queue and will be routed later again.  The will cause the RecordReader to be 
initialized twice in recovery, and cause some weird issue ( may miss some part 
of the data )

{code}
      case ROOT_INPUT_DATA_INFORMATION_EVENT:
        if (vertex.tasksNotYetScheduled) {
          vertex.pendingTaskEvents.add(tezEvent);
        } else {
          checkEventSourceMetadata(vertex, sourceMeta);
          InputDataInformationEvent riEvent = (InputDataInformationEvent) 
tezEvent
              .getEvent();
          Task targetTask = vertex.getTask(riEvent.getTargetIndex());
          targetTask.registerTezEvent(tezEvent);
        }
        break;
{code}

> VertexDataMovementEventsGeneratedEvent may be logged twice in recovery log 
> ---------------------------------------------------------------------------
>
>                 Key: TEZ-1521
>                 URL: https://issues.apache.org/jira/browse/TEZ-1521
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: TEZ-1521-1.patch
>
>
> The TezEvents may be added to pendingTaskEvents and route again later when 
> task is not scheduled. In this case, VertexDataMovementEventsGeneratedEvent 
> will been logged twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to