[
https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603792#action_12603792
]
chris.douglas edited comment on HADOOP-3442 at 6/10/08 12:53 AM:
-----------------------------------------------------------------
bq. QuickSort.getMaxDepth can be more aggressive: since the index are int, we
have n <= 2^32 and lg n <= 32. So, we could safely let QuickSort.getMaxDepth be
something around (lg n)^3 or it should depend on the maximum size of system
stack.
The 2*log\(n) heuristic was intended to bail out of a worst case, not to
protect against the StackOverflowError. The recursion on the system stack is
already limited to log\(n), but quicksort will always have O(n^2) cases that
this needs to handle. I'm happy to discuss an appropriate value for k, but
k*log\(n) seems appropriate to this purpose.
bq. HeapSort is a singleton. Define a static variable HeapSort.INSTANCE. Then,
we don't have to new HeapSort() in QuickSort.
bq. There are quite a few sorting classes/interfaces. Consider create a new
package for them.
Of course, you're right; all the algorithms should be singletons, and further,
having sort algorithms as objects at all is not the best design. Still, given
that MapTask remains the only user, adding factories, moving these into a
separate package, etc. seems like overkill at this point. Would it be
sufficient to add a singleton instance of HeapSort to QuickSort?
bq. public methods HeapSort.sort(IndexedSortable s, int p, int r) and
QuickSort.sort(...) need javadoc.
OK.
bq. Remove BatcherSort in this patch. Create a new issue if it is useful.
Fair enough. BatcherSort is quick for small sets of data (particularly when
compared to QuickSort, which uses an insertion sort for small partitions), but
small datasets aren't exactly a typical use case. :)
was (Author: chris.douglas):
bq. QuickSort.getMaxDepth can be more aggressive: since the index are int,
we have n <= 2^32 and lg n <= 32. So, we could safely let QuickSort.getMaxDepth
be something around (lg n)^3 or it should depend on the maximum size of system
stack.
The 2*log\(n) heuristic was intended to bail out of a worst case, not to
protect against the StackOverflowError. The recursion on the system stack is
already limited to log\(n), but quicksort will always have O(n^2) cases that
this needs to handle. I'm happy to discuss an appropriate value for k, but
k*log\(n) seems appropriate to this purpose.
bq. HeapSort is a singleton. Define a static variable HeapSort.INSTANCE. Then,
we don't have to new HeapSort() in QuickSort.
bq. There are quite a few sorting classes/interfaces. Consider create a new
package for them.
Of course, you're right; all the algorithms should be singletons, and further,
having sort algorithms as objects at all is not the best design. Still, given
that MapTask remains the only user, adding factories, moving these into a
separate package, etc. seems like overkill at this point. Would it be
sufficient to add a singleton instance of HeapSort to QuickSort?
bq. public methods HeapSort.sort(IndexedSortable s, int p, int r) and
QuickSort.sort(...) need javadoc.
OK.
bq. Remove BatcherSort in this patch. Create a new issue if it is useful.
Fair enough. BatcherSort is quick for small sets of data (particularly when
compared to QuickSort, which uses an insertion sort for small partitions), but
small datasets aren't exactly a typical use case. :)
The quicksort implementation might also benefit from some adaptive
partitioning, but that should probably be left to a separate issue.
> QuickSort may get into unbounded recursion
> ------------------------------------------
>
> Key: HADOOP-3442
> URL: https://issues.apache.org/jira/browse/HADOOP-3442
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Runping Qi
> Assignee: Chris Douglas
> Priority: Blocker
> Fix For: 0.17.1, 0.18.0
>
> Attachments: 3442-0.patch, 3442-0v17.patch, 3442-1.patch,
> 3442-2.patch, 3442-3.patch, CheckSortBuffer.java, HADOOP-3442.patch,
> overflow.zip, spillbuffers.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.