Rumen TopologyBuilder ignores hostname info in ReduceAttemptFinishedEvent
-------------------------------------------------------------------------

                 Key: MAPREDUCE-2269
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2269
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: tools/rumen
    Affects Versions: 0.22.0
            Reporter: Greg Roelofs
            Priority: Minor


Rumen's TopologyBuilder component attempts to build up a view of a complete 
cluster over time by processing many jobs' history files (per discussion with 
Dick King).  It appears to be designed to take a greedy approach to this, 
pulling hostnames and rack info out of any JobHistory events that have them.

In particular, it pulls split locations out of TaskStartedEvent and hostnames 
out of TaskAttemptUnsuccessfulCompletionEvent (used for all task types) and 
TaskAttemptFinishedEvent (used only for setup and cleanup task attempts).  It 
omits hostnames in TaskAttemptStartedEvents produced by map attempts (perhaps 
intentional given the split info from TaskStartedEvents?) and in 
ReduceAttemptFinishedEvents (apparently unintentional).  The latter resulted in 
an empty topology and an ArrayIndexOutOfBoundsException in a reduce-only unit 
test (TestTaskPerformanceSplitTranscription modified for an upcoming feature).

I'm not sure if this is intended behavior or a bug; feel free to close if the 
former.  It seemed like TaskAttemptFinishedEvent might have been mistakenly 
believed to cover REDUCE_ATTEMPT_FINISHED.  (If so, the fix to 
TopologyBuilder.java is trivial.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to