Hyunsik Choi created TAJO-36:
--------------------------------

             Summary: Improve ExternalSortExec with N-merge sort and final pass 
omission
                 Key: TAJO-36
                 URL: https://issues.apache.org/jira/browse/TAJO-36
             Project: Tajo
          Issue Type: Improvement
          Components: physical operator
            Reporter: Hyunsik Choi


Background:

The current ExternalSortExec just uses the binary external merge sort algorithm 
(http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other 
words, for each pass, ExternalSortExec just merges two files into one sorted 
file.

Proposal:
The goal of this proposal is to improve ExternalSortExec with the following 
improvements:
* N-merge sort - we can merge N files though more memory at each pass. It will 
reduce the number of passes. Consequently, it will reduces considerable I/O 
overheads.
* the final pass omission - a physical operator is pipelined by the parent 
operator. The final pass of the merge sort must also be invoked by the parent 
physical operator. So, we can omit the final pass of the merge sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to