Re: GSoC 2019

Stephen Ermshar Thu, 08 Aug 2019 22:40:45 -0700

Merge Join Status:

• New changes are on GitHub 
(https://github.com/stephenermshar/asterixdb/tree/stephenermshar/merge-join/master)
 and gerrit (https://asterix-gerrit.ics.uci.edu/#/c/3478/).
    • The main function of the merge-join has been reorganized to be cleaner 
and easier to follow.
    • Spilling to disk is implemented and works in a test query.
    • There is a working hint for SQLPP queries.
• The next step is to clean up and organize the code, there are a few places in 
the main class of the joiner that could be much clearer.
• We should be doing code reviews and wrapping the change up in the next couple 
weeks.



- Stephen Ermshar
On Jul 23, 2019, 4:25 PM -0700, Stephen Ermshar <[email protected]>, 
wrote:
> GSoC 2019: Implementing Merge Join:
>
> • We have a working simple merge joiner implementation in a gerrit change 
> here (https://asterix-gerrit.ics.uci.edu/#/c/3478/).
> • This next week we’ll be working on handling limited memory by spilling with 
> a run file.
> • The next step will be to setup the hint and optimizer to use the merge 
> joiner when appropriate; currently we have added a condition in JoinUtils 
> that forces it to always use the merge joiner.
>
>
> Multi Partitioning / Partial Broadcast Operator:
>
> • We’ve also been working on a separate lower priority change that is mainly 
> focused on supporting interval join partitioning of Project, Split, and 
> Replicate partitioning schemes.
> • That change is also on gerrit here 
> (https://asterix-gerrit.ics.uci.edu/#/c/3223/) with more details in the 
> commit message.
>
>
> - Stephen Ermshar
> On Jul 2, 2019, 1:05 PM -0700, Stephen Ermshar <[email protected]>, 
> wrote:
> > Here the first update for GSoC 2019: Implementing Merge Join. I plan on 
> > writing updates like this every two weeks. Let me know if there are any 
> > thoughts on the project or these updates. Thanks!
> >
> > Our current status:
> >
> > • My fork of AsterixDB for GSoC is here 
> > (https://github.com/stephenermshar/asterixdb).
> > • When we started we worked on pulling in old merge-join code to start off 
> > with, but then decided to put end to end tests for the query and the 
> > optimizer in place first.
> >     • The optimizer test is currently blank. The runtime test in 
> > `merge-join/equi-join` is based on the `btree-index-nested-loop-join` test 
> > and has a merge-join hint.
> >     • Those tests are in my fork on the branch 
> > [stephenermshar/merge-join/Import-original-hyracks-merge-join](https://github.com/stephenermshar/asterixdb/tree/stephenermshar/merge-join/import-original-hyracks-merge-join).
> >
> > The general plan at this point is to add a Merge Join Operator. It will 
> > have an activity to take two data input streams, and a joiner activity that 
> > can request data from the input activity when it’s ready. Our first 
> > implementation won’t include spilling to disk.
> >
> > - Stephen Ermshar
> > On May 28, 2019, 2:35 PM -0700, Mike Carey <[email protected]>, wrote:
> > > Cool!  I would be happy to be looped in occasionally as well - so I will
> > > watch this space for the public progress updates.  In addition to adding
> > > the algorithm, we'll need to add an optimizer rule (actually update an
> > > existing rule) so that it gets picked when appropriate - and also a hint
> > > so that you can manually suggest that it be picked regardless.  DB2
> > > chooses this join method when the incoming data arguments are already
> > > sorted, so that would likely be a good heuristic for us too.  (It is
> > > sure to be cheaper than anything else in that case.)  (So think about
> > > this as adding a Merge Join, not a Sort Merge Join, actually, as is not
> > > likely to be the case that sorting and then doing this will be a cost 
> > > win.)
> > >
> > > On 5/26/19 9:22 PM, Stephen Ermshar (gmail) wrote:
> > > > Hi Ali,
> > > >
> > > > Since the coding period starts tomorrow I wanted to get in touch with 
> > > > you again.
> > > >
> > > > Preston and I were thinking it would be good to post weekly reports 
> > > > here on the dev mailing list to keep a public record of our progress. 
> > > > We’d also like to have weekly or twice-weekly meetings to track our 
> > > > progress and direction. If you have any thoughts on how we can organize 
> > > > our efforts more this summer let me know.
> > > >
> > > > This week Preston and I decided we’d focus on refreshing my knowledge 
> > > > of the Sort Merge Join algorithm. I’d like to also get a fresh 
> > > > development environment setup for this summer and if possible start 
> > > > moving old code in from the outdated repository.
> > > >
> > > > I’m excited to start working on this!
> > > >
> > > > - Stephen Ermshar
> > > >

Re: GSoC 2019

Reply via email to