I have no Java implementation of my job, sorry.
Since it's all in the map side, IdentityMapper/IdentityReducer is
fine, as long as both the splits and the number of reduce tasks are
the same.
The data is a representation for loglines, and not exactly small,
e.g. the
stuff has already been reduced once.
By "not exactly small, do you mean each line is long or that there
are many records?
The interesting thing is that it happens inside the last Map task,
not in the
reducer tasks.
As you can see above the mapper cmd is rather on the simple side.
util.QuickSort is only used on the map side, so this shouldn't have
anything to do with the reduce. Is it always and only the *last* map
task that fails? If I sent you a patch that would print a trace with
the partitions, would you mind running it? Do you have any other
settings that differ from the defaults? -C