[
https://issues.apache.org/jira/browse/TAJO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635727#comment-13635727
]
Alexander Sibetheros commented on TAJO-36:
------------------------------------------
Hello,
I read the above summary of the project requested and it seems rather
interesting. I haven't seen how apache implements ExternalSortExec but will do
so shortly.
I am a 4th year student in Informatics(University of Athens), with great
background in algorithms(especially sorting), strong c,c++ skills and recently
took part in the Sigmod 2013 programming contest(results pending), which
required lots of research into algorithms and fast indexing and sorting
mechanisms.
Could you please point me to the correct place in the code to see how the sort
is implemented?
> Improve ExternalSortExec with N-merge sort and final pass omission
> ------------------------------------------------------------------
>
> Key: TAJO-36
> URL: https://issues.apache.org/jira/browse/TAJO-36
> Project: Tajo
> Issue Type: Improvement
> Components: physical operator
> Reporter: Hyunsik Choi
> Labels: gsoc, gsoc2013, mentor
>
> Background:
> The current ExternalSortExec just uses the binary external merge sort
> algorithm
> (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other
> words, for each pass, ExternalSortExec just merges two files into one sorted
> file.
> Proposal:
> The goal of this proposal is to improve ExternalSortExec with the following
> improvements:
> * N-merge sort - we can merge N files though more memory at each pass. It
> will reduce the number of passes. Consequently, it will reduces considerable
> I/O overheads.
> * the final pass omission - a physical operator is pipelined by the parent
> operator. The final pass of the merge sort must also be invoked by the parent
> physical operator. So, we can omit the final pass of the merge sort.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira