[ 
https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603792#action_12603792
 ] 

chris.douglas edited comment on HADOOP-3442 at 6/10/08 12:53 AM:
-----------------------------------------------------------------

bq.  QuickSort.getMaxDepth can be more aggressive: since the index are int, we 
have n <= 2^32 and lg n <= 32. So, we could safely let QuickSort.getMaxDepth be 
something around (lg n)^3 or it should depend on the maximum size of system 
stack.

The 2*log\(n) heuristic was intended to bail out of a worst case, not to 
protect against the StackOverflowError. The recursion on the system stack is 
already limited to log\(n), but quicksort will always have O(n^2) cases that 
this needs to handle. I'm happy to discuss an appropriate value for k, but 
k*log\(n) seems appropriate to this purpose.

bq. HeapSort is a singleton. Define a static variable HeapSort.INSTANCE. Then, 
we don't have to new HeapSort() in QuickSort.
bq. There are quite a few sorting classes/interfaces. Consider create a new 
package for them.

Of course, you're right; all the algorithms should be singletons, and further, 
having sort algorithms as objects at all is not the best design. Still, given 
that MapTask remains the only user, adding factories, moving these into a 
separate package, etc. seems like overkill at this point. Would it be 
sufficient to add a singleton instance of HeapSort to QuickSort?

bq. public methods HeapSort.sort(IndexedSortable s, int p, int r) and 
QuickSort.sort(...) need javadoc.

OK.

bq. Remove BatcherSort in this patch. Create a new issue if it is useful.

Fair enough. BatcherSort is quick for small sets of data (particularly when 
compared to QuickSort, which uses an insertion sort for small partitions), but 
small datasets aren't exactly a typical use case. :)

      was (Author: chris.douglas):
    bq.  QuickSort.getMaxDepth can be more aggressive: since the index are int, 
we have n <= 2^32 and lg n <= 32. So, we could safely let QuickSort.getMaxDepth 
be something around (lg n)^3 or it should depend on the maximum size of system 
stack.

The 2*log\(n) heuristic was intended to bail out of a worst case, not to 
protect against the StackOverflowError. The recursion on the system stack is 
already limited to log\(n), but quicksort will always have O(n^2) cases that 
this needs to handle. I'm happy to discuss an appropriate value for k, but 
k*log\(n) seems appropriate to this purpose.

bq. HeapSort is a singleton. Define a static variable HeapSort.INSTANCE. Then, 
we don't have to new HeapSort() in QuickSort.
bq. There are quite a few sorting classes/interfaces. Consider create a new 
package for them.

Of course, you're right; all the algorithms should be singletons, and further, 
having sort algorithms as objects at all is not the best design. Still, given 
that MapTask remains the only user, adding factories, moving these into a 
separate package, etc. seems like overkill at this point. Would it be 
sufficient to add a singleton instance of HeapSort to QuickSort?

bq. public methods HeapSort.sort(IndexedSortable s, int p, int r) and 
QuickSort.sort(...) need javadoc.

OK.

bq. Remove BatcherSort in this patch. Create a new issue if it is useful.

Fair enough. BatcherSort is quick for small sets of data (particularly when 
compared to QuickSort, which uses an insertion sort for small partitions), but 
small datasets aren't exactly a typical use case. :)

The quicksort implementation might also benefit from some adaptive 
partitioning, but that should probably be left to a separate issue.
  
> QuickSort may get into unbounded recursion
> ------------------------------------------
>
>                 Key: HADOOP-3442
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3442
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>            Priority: Blocker
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: 3442-0.patch, 3442-0v17.patch, 3442-1.patch, 
> 3442-2.patch, 3442-3.patch, CheckSortBuffer.java, HADOOP-3442.patch, 
> overflow.zip, spillbuffers.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to