[
https://issues.apache.org/jira/browse/TEZ-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520693#comment-14520693
]
Jeff Zhang commented on TEZ-1521:
---------------------------------
[~hitesh] The root_input_data_information_event may be routed twice. When it is
first routed, the vertex has not been scheduled, so will be put in the pending
queue and will be routed later again. The will cause the RecordReader to be
initialized twice in recovery, and cause some weird issue ( may miss some part
of the data )
{code}
case ROOT_INPUT_DATA_INFORMATION_EVENT:
if (vertex.tasksNotYetScheduled) {
vertex.pendingTaskEvents.add(tezEvent);
} else {
checkEventSourceMetadata(vertex, sourceMeta);
InputDataInformationEvent riEvent = (InputDataInformationEvent)
tezEvent
.getEvent();
Task targetTask = vertex.getTask(riEvent.getTargetIndex());
targetTask.registerTezEvent(tezEvent);
}
break;
{code}
> VertexDataMovementEventsGeneratedEvent may be logged twice in recovery log
> ---------------------------------------------------------------------------
>
> Key: TEZ-1521
> URL: https://issues.apache.org/jira/browse/TEZ-1521
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: TEZ-1521-1.patch
>
>
> The TezEvents may be added to pendingTaskEvents and route again later when
> task is not scheduled. In this case, VertexDataMovementEventsGeneratedEvent
> will been logged twice.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)