[
https://issues.apache.org/jira/browse/TAJO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635727#comment-13635727
]
Alexander Sibetheros edited comment on TAJO-36 at 4/18/13 10:32 PM:
--------------------------------------------------------------------
Hello,
I read the above summary of the project requested and it seems rather
interesting.
Although I don't have any coding history with the apache project, I just
finished reading the ExternalSortExec.java and the implementation now seems
rather straight forward, so I believe with some guidance I should be able to
handle the task.
Currently I am a 4th year student in Informatics(University of Athens), with
great background in algorithms(especially sorting), strong c,c++,java,python
skills and recently took part in the Sigmod 2013 programming contest(results
pending), which required lots of research into algorithms and fast indexing and
sorting mechanisms. This summer I have no remaining classes and my graduation
thesis will begin in October, so I will have plenty of time to write code, test
thoroughly and document.
was (Author: sib_):
Hello,
I read the above summary of the project requested and it seems rather
interesting.
I am a 4th year student in Informatics(University of Athens), with great
background in algorithms(especially sorting), strong c,c++,java,python skills
and recently took part in the Sigmod 2013 programming contest(results pending),
which required lots of research into algorithms and fast indexing and sorting
mechanisms.
I just finished reading the ExternalSortExec.java and the implementation now
seems rather straight forward, so I believe with some guidance I should be able
to handle the task.
> Improve ExternalSortExec with N-merge sort and final pass omission
> ------------------------------------------------------------------
>
> Key: TAJO-36
> URL: https://issues.apache.org/jira/browse/TAJO-36
> Project: Tajo
> Issue Type: Improvement
> Components: physical operator
> Reporter: Hyunsik Choi
> Labels: gsoc, gsoc2013, mentor
>
> Background:
> The current ExternalSortExec just uses the binary external merge sort
> algorithm
> (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other
> words, for each pass, ExternalSortExec just merges two files into one sorted
> file.
> Proposal:
> The goal of this proposal is to improve ExternalSortExec with the following
> improvements:
> * N-merge sort - we can merge N files though more memory at each pass. It
> will reduce the number of passes. Consequently, it will reduces considerable
> I/O overheads.
> * the final pass omission - a physical operator is pipelined by the parent
> operator. The final pass of the merge sort must also be invoked by the parent
> physical operator. So, we can omit the final pass of the merge sort.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira