[ 
https://issues.apache.org/jira/browse/TAJO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635727#comment-13635727
 ] 

Alexander Sibetheros commented on TAJO-36:
------------------------------------------

Hello,
I read the above summary of the project requested and it seems rather 
interesting. I haven't seen how apache implements ExternalSortExec but will do 
so shortly.
I am a 4th year student in Informatics(University of Athens), with great 
background in algorithms(especially sorting), strong c,c++ skills and recently 
took part in the Sigmod 2013 programming contest(results pending), which 
required lots of research into algorithms and fast indexing and sorting 
mechanisms. 
Could you please point me to the correct place in the code to see how the sort 
is implemented?
                
> Improve ExternalSortExec with N-merge sort and final pass omission
> ------------------------------------------------------------------
>
>                 Key: TAJO-36
>                 URL: https://issues.apache.org/jira/browse/TAJO-36
>             Project: Tajo
>          Issue Type: Improvement
>          Components: physical operator
>            Reporter: Hyunsik Choi
>              Labels: gsoc, gsoc2013, mentor
>
> Background:
> The current ExternalSortExec just uses the binary external merge sort 
> algorithm 
> (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other 
> words, for each pass, ExternalSortExec just merges two files into one sorted 
> file.
> Proposal:
> The goal of this proposal is to improve ExternalSortExec with the following 
> improvements:
> * N-merge sort - we can merge N files though more memory at each pass. It 
> will reduce the number of passes. Consequently, it will reduces considerable 
> I/O overheads.
> * the final pass omission - a physical operator is pipelined by the parent 
> operator. The final pass of the merge sort must also be invoked by the parent 
> physical operator. So, we can omit the final pass of the merge sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to