[ 
https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602752#action_12602752
 ] 

Chris Douglas commented on HADOOP-3442:
---------------------------------------

bq. if p+r < 0, it is a problem.

Until the first compare, when the IndexedSortable (i.e. MapOutputBuffer) will 
throw an ArrayIndexOutOfBoundsException.

bq. Is this a new bug? Has the sort code been change recently or something that 
feeds it?

The QuickSort is new in 0.17; we used to use MergeSort.

bq. Don't the classic texts have chapters on pivot selection?

The default HashPartitioner- which almost everybody uses- should produce fairly 
random distributions, save equal keys (which is why increasing the number of 
reducers usually makes this go away). It was verified that the 
"median-of-three" partitioning would be sufficient to handle the sorted, 
single-reducer case, which is the canonical worst-case for quicksort. In the 
literature reviewed, cases where "median-of-three" partitioning fails are 
spoken of as being deliberately crafted, i.e. as possible DoS vectors, not as 
something common to real-world data. Someone was able to get a sharable spill 
and indices, so we'll do some analysis to figure out why it's more common than 
we thought it would be.

bq. Randomize the input data before calling quicksort (O\(n)) cost

Randomizing the indices should be pretty cheap, so this is probably a good 
idea. Since we're confident that this really is a problem with the recursion, 
we should also investigate Chad's suggestion of Introsort. This is what the STL 
uses: http://www.sgi.com/tech/stl/sort.html (see [2])

> QuickSort may get into unbounded recursion
> ------------------------------------------
>
>                 Key: HADOOP-3442
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3442
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>         Attachments: 3442-0.patch, 3442-0v17.patch, CheckSortBuffer.java, 
> HADOOP-3442.patch, overflow.zip, spillbuffers.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to