Merge Join Status: • New changes are on GitHub (https://github.com/stephenermshar/asterixdb/tree/stephenermshar/merge-join/master) and gerrit (https://asterix-gerrit.ics.uci.edu/#/c/3478/). • The main function of the merge-join has been reorganized to be cleaner and easier to follow. • Spilling to disk is implemented and works in a test query. • There is a working hint for SQLPP queries. • The next step is to clean up and organize the code, there are a few places in the main class of the joiner that could be much clearer. • We should be doing code reviews and wrapping the change up in the next couple weeks.
- Stephen Ermshar On Jul 23, 2019, 4:25 PM -0700, Stephen Ermshar <[email protected]>, wrote: > GSoC 2019: Implementing Merge Join: > > • We have a working simple merge joiner implementation in a gerrit change > here (https://asterix-gerrit.ics.uci.edu/#/c/3478/). > • This next week we’ll be working on handling limited memory by spilling with > a run file. > • The next step will be to setup the hint and optimizer to use the merge > joiner when appropriate; currently we have added a condition in JoinUtils > that forces it to always use the merge joiner. > > > Multi Partitioning / Partial Broadcast Operator: > > • We’ve also been working on a separate lower priority change that is mainly > focused on supporting interval join partitioning of Project, Split, and > Replicate partitioning schemes. > • That change is also on gerrit here > (https://asterix-gerrit.ics.uci.edu/#/c/3223/) with more details in the > commit message. > > > - Stephen Ermshar > On Jul 2, 2019, 1:05 PM -0700, Stephen Ermshar <[email protected]>, > wrote: > > Here the first update for GSoC 2019: Implementing Merge Join. I plan on > > writing updates like this every two weeks. Let me know if there are any > > thoughts on the project or these updates. Thanks! > > > > Our current status: > > > > • My fork of AsterixDB for GSoC is here > > (https://github.com/stephenermshar/asterixdb). > > • When we started we worked on pulling in old merge-join code to start off > > with, but then decided to put end to end tests for the query and the > > optimizer in place first. > > • The optimizer test is currently blank. The runtime test in > > `merge-join/equi-join` is based on the `btree-index-nested-loop-join` test > > and has a merge-join hint. > > • Those tests are in my fork on the branch > > [stephenermshar/merge-join/Import-original-hyracks-merge-join](https://github.com/stephenermshar/asterixdb/tree/stephenermshar/merge-join/import-original-hyracks-merge-join). > > > > The general plan at this point is to add a Merge Join Operator. It will > > have an activity to take two data input streams, and a joiner activity that > > can request data from the input activity when it’s ready. Our first > > implementation won’t include spilling to disk. > > > > - Stephen Ermshar > > On May 28, 2019, 2:35 PM -0700, Mike Carey <[email protected]>, wrote: > > > Cool! I would be happy to be looped in occasionally as well - so I will > > > watch this space for the public progress updates. In addition to adding > > > the algorithm, we'll need to add an optimizer rule (actually update an > > > existing rule) so that it gets picked when appropriate - and also a hint > > > so that you can manually suggest that it be picked regardless. DB2 > > > chooses this join method when the incoming data arguments are already > > > sorted, so that would likely be a good heuristic for us too. (It is > > > sure to be cheaper than anything else in that case.) (So think about > > > this as adding a Merge Join, not a Sort Merge Join, actually, as is not > > > likely to be the case that sorting and then doing this will be a cost > > > win.) > > > > > > On 5/26/19 9:22 PM, Stephen Ermshar (gmail) wrote: > > > > Hi Ali, > > > > > > > > Since the coding period starts tomorrow I wanted to get in touch with > > > > you again. > > > > > > > > Preston and I were thinking it would be good to post weekly reports > > > > here on the dev mailing list to keep a public record of our progress. > > > > We’d also like to have weekly or twice-weekly meetings to track our > > > > progress and direction. If you have any thoughts on how we can organize > > > > our efforts more this summer let me know. > > > > > > > > This week Preston and I decided we’d focus on refreshing my knowledge > > > > of the Sort Merge Join algorithm. I’d like to also get a fresh > > > > development environment setup for this summer and if possible start > > > > moving old code in from the outdated repository. > > > > > > > > I’m excited to start working on this! > > > > > > > > - Stephen Ermshar > > > >
