[
https://issues.apache.org/jira/browse/MAPREDUCE-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oliver Hummel updated MAPREDUCE-7085:
-------------------------------------
Summary: while loop in InputSampler.writePartitionFile method is never
entered (was: while loop in InputSampler.writePartitionFile method does not
make sense)
> while loop in InputSampler.writePartitionFile method is never entered
> ---------------------------------------------------------------------
>
> Key: MAPREDUCE-7085
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7085
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 2.8.3
> Reporter: Oliver Hummel
> Priority: Minor
>
> After getting the split points are out of order exception from
> TotalOrderPartitioner, I dug into the source of the
> org.apache.hadoop.mapreduce.lib.partition.InputSampler class and found that
> the while loop in line 335 is never entered.
> The reason is that the variable last is always smaller than k and the loop
> condition says that last must be larger or equal than k.
> I am not completely sure of the initial purpose of this loop, if it is what I
> assume, namely reducing the occurrences of identical split points, then I
> would change it like so:
> while (last != -1 && k > last && comparator.compare(samples[last],
> samples[k]) == 0) {
> --k;
> }
> However, this only slightly mitigates the problem, since a highly skewed
> distribution of keys still might lead to identical split points so that
> potentially further measures might be necessary?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]