[ 
https://issues.apache.org/jira/browse/TAJO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi reassigned TAJO-36:
--------------------------------

    Assignee: Hyunsik Choi

> Improve ExternalSortExec with N-merge sort and final pass omission
> ------------------------------------------------------------------
>
>                 Key: TAJO-36
>                 URL: https://issues.apache.org/jira/browse/TAJO-36
>             Project: Tajo
>          Issue Type: Improvement
>          Components: physical operator
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>            Priority: Critical
>              Labels: gsoc, gsoc2013, mentor
>
> Background:
> The current ExternalSortExec just uses the binary external merge sort 
> algorithm 
> (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other 
> words, for each pass, ExternalSortExec just merges two files into one sorted 
> file.
> Proposal:
> The goal of this proposal is to improve ExternalSortExec with the following 
> improvements:
> * N-merge sort - we can merge N files though more memory at each pass. It 
> will reduce the number of passes. Consequently, it will reduces considerable 
> I/O overheads.
> * the final pass omission - a physical operator is pipelined by the parent 
> operator. The final pass of the merge sort must also be invoked by the parent 
> physical operator. So, we can omit the final pass of the merge sort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to