Hitesh Sharma created TEZ-3996:
----------------------------------

             Summary: Reorder input failed events before data movement events
                 Key: TEZ-3996
                 URL: https://issues.apache.org/jira/browse/TEZ-3996
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Hitesh Sharma


We have a custom processor (AbstractLogicalIOProcessor) that waits for 
DataMovementEvent to arrive and then starts an external process to do some 
work. When a revocation happens then the processor recieves an 
InputFailedEvent, which tells it about the failed input, and we fail the 
processor as it is working on old inputs. When the new inputs are available 
then Tez restarts the processor and sends the InputFailedEvent along with all 
the DataMovementEvent which includes the older versions and the new version 
that was revocated.

The issue we are seeing is that the events arrive out of order i.e. many times 
we see the older DataMovementEvent first at which our processor thinks it is 
good to start. We then receive the InputFailedEvent and the new version of 
DataMovementEvent, but that's late and the processor fails. This keeps 
repeating on every subsequent task attempt and the task fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to