On Wed, Mar 13, 2013 at 7:07 AM, David Alves <[email protected]> wrote:
> I was going through the list looking for the current stance on > joins and I found Ted's answer. > What is the main point behind not doing large joins on Drill? > Not doing large joins *yet*. > Is it just simplicity (as in optimizer, etc.) or is there > something else? > Simplicity in early implementation. > I mention this because I'm particularly interested in large self > joins (I'd can volunteer to work on them myself, of course). > I would love to see large self joins. In pig notation, I would be interested in co-group of multiple fields on a single key field followed by counting of all pairs in each of the groups. Counting the cross-group pairs is also interesting. Saying this concisely in SQL is hard for me, especially since I would like to down-sample each of the groups. I can say it with many queries and multiple temp tables, but I expect that this would be difficult for the optimizer to understand. I can also say it concisely in Drill's intermediate language.
