Partitioner Happilly accepts negative int number and data gets lost in Hadoop
framework
---------------------------------------------------------------------------------------
Key: HADOOP-3425
URL: https://issues.apache.org/jira/browse/HADOOP-3425
Project: Hadoop Core
Issue Type: Bug
Reporter: Amir Youssefi
Using Partitioner,
If user passes negative partition number, framework happily accepts it. Data
goes to wrong location and (many) reducers get zero data. Suggested
resolutions:
1) Prevent the problem from start. partitioner checks the range and throws an
exception if that' out of range.
2) Have a more generic check: Compare counters to see if all data gets past
Shuffle stage. No leak. Per feedback we got from Owen, this idea get a bit
complicated when considering having combiners.
Example: using my_id.hashCode() % numPartitions creates negative numbers and
data gets lost in the framework. Reducers get zero rows ( while data is
actually in partitions index with negative numbers).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.