[ 
https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602823#action_12602823
 ] 

Chris Douglas commented on HADOOP-3442:
---------------------------------------

Analysis of the data (thanks to everyone who provided their test cases) led us 
to consider the following degenerate case:

Consider a partition:
{noformat}
a_n, a_1, a_2, ... , a_n-2, a_n-1
{noformat}

Where {{a_1 ... a_n-1}} are sorted. The median of three partitioning will 
consider {{a_n}}, {{a_n/2}}, and {{a_n-1}} and select {{a_n-1}} as the pivot. 
While the sort runs:
{noformat}
a_n-1, a_1, a_2, ... , a_n-2, a_n
{noformat}

The left index will run all the way to {{a_n}} and swap the pivot into place, 
yielding the following:
{noformat}
a_n-2, a_1, a_2, ... , a_n-3, a_n-1, a_n
{noformat}

So the next partition will get:
{noformat}
a_n-2, a_1, a_2, ... , a_n-4, a_n-3
{noformat}
So while sorted data will yield a series of optimal partitions, nearly sorted 
data like this can cause the sort to fall into a degenerate case. Among the 
suggestions to ameliorate this:
# Consider the median and two random offsets for the median-of-three 
partitioning (or three random offsets, etc.)
# Always pick a random pivot
# After swapping the pivot into place, swap what it replaced into a random 
position in the left partition

Randomizing the input data makes this case far less common and Introsort 
regards it as an inevitable, degenerate case; both are also sound additions.

> QuickSort may get into unbounded recursion
> ------------------------------------------
>
>                 Key: HADOOP-3442
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3442
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>         Attachments: 3442-0.patch, 3442-0v17.patch, CheckSortBuffer.java, 
> HADOOP-3442.patch, overflow.zip, spillbuffers.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to