[
https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602823#action_12602823
]
Chris Douglas commented on HADOOP-3442:
---------------------------------------
Analysis of the data (thanks to everyone who provided their test cases) led us
to consider the following degenerate case:
Consider a partition:
{noformat}
a_n, a_1, a_2, ... , a_n-2, a_n-1
{noformat}
Where {{a_1 ... a_n-1}} are sorted. The median of three partitioning will
consider {{a_n}}, {{a_n/2}}, and {{a_n-1}} and select {{a_n-1}} as the pivot.
While the sort runs:
{noformat}
a_n-1, a_1, a_2, ... , a_n-2, a_n
{noformat}
The left index will run all the way to {{a_n}} and swap the pivot into place,
yielding the following:
{noformat}
a_n-2, a_1, a_2, ... , a_n-3, a_n-1, a_n
{noformat}
So the next partition will get:
{noformat}
a_n-2, a_1, a_2, ... , a_n-4, a_n-3
{noformat}
So while sorted data will yield a series of optimal partitions, nearly sorted
data like this can cause the sort to fall into a degenerate case. Among the
suggestions to ameliorate this:
# Consider the median and two random offsets for the median-of-three
partitioning (or three random offsets, etc.)
# Always pick a random pivot
# After swapping the pivot into place, swap what it replaced into a random
position in the left partition
Randomizing the input data makes this case far less common and Introsort
regards it as an inevitable, degenerate case; both are also sound additions.
> QuickSort may get into unbounded recursion
> ------------------------------------------
>
> Key: HADOOP-3442
> URL: https://issues.apache.org/jira/browse/HADOOP-3442
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Runping Qi
> Assignee: Chris Douglas
> Attachments: 3442-0.patch, 3442-0v17.patch, CheckSortBuffer.java,
> HADOOP-3442.patch, overflow.zip, spillbuffers.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.