Here the first update for GSoC 2019: Implementing Merge Join. I plan on writing 
updates like this every two weeks. Let me know if there are any thoughts on the 
project or these updates. Thanks!

Our current status:

• My fork of AsterixDB for GSoC is here 
(https://github.com/stephenermshar/asterixdb).
• When we started we worked on pulling in old merge-join code to start off 
with, but then decided to put end to end tests for the query and the optimizer 
in place first.
    • The optimizer test is currently blank. The runtime test in 
`merge-join/equi-join` is based on the `btree-index-nested-loop-join` test and 
has a merge-join hint.
    • Those tests are in my fork on the branch 
[stephenermshar/merge-join/Import-original-hyracks-merge-join](https://github.com/stephenermshar/asterixdb/tree/stephenermshar/merge-join/import-original-hyracks-merge-join).

The general plan at this point is to add a Merge Join Operator. It will have an 
activity to take two data input streams, and a joiner activity that can request 
data from the input activity when it’s ready. Our first implementation won’t 
include spilling to disk.

- Stephen Ermshar
On May 28, 2019, 2:35 PM -0700, Mike Carey <[email protected]>, wrote:
> Cool!  I would be happy to be looped in occasionally as well - so I will
> watch this space for the public progress updates.  In addition to adding
> the algorithm, we'll need to add an optimizer rule (actually update an
> existing rule) so that it gets picked when appropriate - and also a hint
> so that you can manually suggest that it be picked regardless.  DB2
> chooses this join method when the incoming data arguments are already
> sorted, so that would likely be a good heuristic for us too.  (It is
> sure to be cheaper than anything else in that case.)  (So think about
> this as adding a Merge Join, not a Sort Merge Join, actually, as is not
> likely to be the case that sorting and then doing this will be a cost win.)
>
> On 5/26/19 9:22 PM, Stephen Ermshar (gmail) wrote:
> > Hi Ali,
> >
> > Since the coding period starts tomorrow I wanted to get in touch with you 
> > again.
> >
> > Preston and I were thinking it would be good to post weekly reports here on 
> > the dev mailing list to keep a public record of our progress. We’d also 
> > like to have weekly or twice-weekly meetings to track our progress and 
> > direction. If you have any thoughts on how we can organize our efforts more 
> > this summer let me know.
> >
> > This week Preston and I decided we’d focus on refreshing my knowledge of 
> > the Sort Merge Join algorithm. I’d like to also get a fresh development 
> > environment setup for this summer and if possible start moving old code in 
> > from the outdated repository.
> >
> > I’m excited to start working on this!
> >
> > - Stephen Ermshar
> >

Reply via email to