[
https://issues.apache.org/jira/browse/TAJO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi updated TAJO-36:
-----------------------------
Labels: gsoc gsoc2013 mentor (was: gsoc2013 mentor)
> Improve ExternalSortExec with N-merge sort and final pass omission
> ------------------------------------------------------------------
>
> Key: TAJO-36
> URL: https://issues.apache.org/jira/browse/TAJO-36
> Project: Tajo
> Issue Type: Improvement
> Components: physical operator
> Reporter: Hyunsik Choi
> Labels: gsoc, gsoc2013, mentor
>
> Background:
> The current ExternalSortExec just uses the binary external merge sort
> algorithm
> (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other
> words, for each pass, ExternalSortExec just merges two files into one sorted
> file.
> Proposal:
> The goal of this proposal is to improve ExternalSortExec with the following
> improvements:
> * N-merge sort - we can merge N files though more memory at each pass. It
> will reduce the number of passes. Consequently, it will reduces considerable
> I/O overheads.
> * the final pass omission - a physical operator is pipelined by the parent
> operator. The final pass of the merge sort must also be invoked by the parent
> physical operator. So, we can omit the final pass of the merge sort.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira