Bikas Saha created TEZ-2313:
-------------------------------
Summary: Regression in handling obsolete events in ShuffleScheduler
Key: TEZ-2313
URL: https://issues.apache.org/jira/browse/TEZ-2313
Project: Apache Tez
Issue Type: Bug
Reporter: Bikas Saha
Priority: Critical
/cc [~rohini]
When an obsolete event is received then the shuffle scheduler fails fast even
when pipelining is disabled. IIRC, obsolete inputs were supposed to fail the
shuffled inputs if we were reading and merging partial spilled outputs. But in
this case, pipelining is not on. So not sure why we are failing fast.
{noformat}
Caused by: java.io.IOException: InputAttemptIdentifier
[inputIdentifier=InputIdentifier [inputIndex=4485], attemptNumber=1,
pathComponent=null, fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1] is
marked as obsoleteInput, but it exists in shuffleInfoEventMap. Some data could
have been already merged to memory/disk outputs. Failing the fetch early.
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.obsoleteInput(ShuffleScheduler.java:546)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleInputEventHandlerOrderedGrouped.processTaskFailedEvent(ShuffleInputEventHandlerOrderedGrouped.java:122)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleInputEventHandlerOrderedGrouped.handleEvent(ShuffleInputEventHandlerOrderedGrouped.java:73)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleInputEventHandlerOrderedGrouped.handleEvents(ShuffleInputEventHandlerOrderedGrouped.java:63)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.handleEvents(Shuffle.java:246)
at
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.handleEvents(OrderedGroupedKVInput.java:265)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:620)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$1100(LogicalIOProcessorRuntimeTask.java:93)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:683)
at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35){noformat}
/cc [~rajesh.balamohan]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)