[
https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602752#action_12602752
]
Chris Douglas commented on HADOOP-3442:
---------------------------------------
bq. if p+r < 0, it is a problem.
Until the first compare, when the IndexedSortable (i.e. MapOutputBuffer) will
throw an ArrayIndexOutOfBoundsException.
bq. Is this a new bug? Has the sort code been change recently or something that
feeds it?
The QuickSort is new in 0.17; we used to use MergeSort.
bq. Don't the classic texts have chapters on pivot selection?
The default HashPartitioner- which almost everybody uses- should produce fairly
random distributions, save equal keys (which is why increasing the number of
reducers usually makes this go away). It was verified that the
"median-of-three" partitioning would be sufficient to handle the sorted,
single-reducer case, which is the canonical worst-case for quicksort. In the
literature reviewed, cases where "median-of-three" partitioning fails are
spoken of as being deliberately crafted, i.e. as possible DoS vectors, not as
something common to real-world data. Someone was able to get a sharable spill
and indices, so we'll do some analysis to figure out why it's more common than
we thought it would be.
bq. Randomize the input data before calling quicksort (O\(n)) cost
Randomizing the indices should be pretty cheap, so this is probably a good
idea. Since we're confident that this really is a problem with the recursion,
we should also investigate Chad's suggestion of Introsort. This is what the STL
uses: http://www.sgi.com/tech/stl/sort.html (see [2])
> QuickSort may get into unbounded recursion
> ------------------------------------------
>
> Key: HADOOP-3442
> URL: https://issues.apache.org/jira/browse/HADOOP-3442
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Runping Qi
> Assignee: Chris Douglas
> Attachments: 3442-0.patch, 3442-0v17.patch, CheckSortBuffer.java,
> HADOOP-3442.patch, overflow.zip, spillbuffers.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.