Travis Woodruff created TEZ-3582:
------------------------------------
Summary: Exception swallowed in PipelinedSorter causing incorrect
results
Key: TEZ-3582
URL: https://issues.apache.org/jira/browse/TEZ-3582
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.8.4
Reporter: Travis Woodruff
I've run into a potentially serious issue with yarn-tez mapreduce.
We've recently moved from using classic mapreduce on hadoop 1.0.3 to using Tez,
and a user noticed a data inconsistency in some results calculated via yarn-tez.
On investigation, I've determined that an error occurred during key
deserialization while sorting.
In this case, {{PipelinedSorter.SpanMerger.ready()}} caught the resulting
{{ExecutionException}}, logged the message (though it should really be logging
the stack trace as well), and returned false. {{PipelinedSorter.spill()}}
interpreted the returned false as an empty spill and continued with no
indication that an error occur. This resulted in data that existed in the sort
buffer after the error record being lost.
I suspect that there may also be an error somewhere else in the sort code that
is causing buffer corruption (or index corruption), since we've been using this
mapreduce code for years and have never seen a deserialization error here;
however, I can't confirm that there isn't a subtle error on our side.
In any case, the fact that Tez is silently swallowing errors is a critical
issue for us, as we can't trust the results it produces.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)