On Wed, Mar 13, 2013 at 7:07 AM, David Alves <[email protected]> wrote:

>         I was going through the list looking for the current stance on
> joins and I found Ted's answer.
>         What is the main point behind not doing large joins on Drill?
>

Not doing large joins *yet*.


>          Is it just simplicity (as in optimizer, etc.) or is there
> something else?
>

Simplicity in early implementation.


>          I mention this because I'm particularly interested in large self
> joins (I'd can volunteer to work on them myself, of course).
>

I would love to see large self joins.  In pig notation, I would be
interested in co-group of multiple fields on a single key field followed by
counting of all pairs in each of the groups.  Counting the cross-group
pairs is also interesting.  Saying this concisely in SQL is hard for me,
especially since I would like to down-sample each of the groups.  I can say
it with many queries and multiple temp tables, but I expect that this would
be difficult for the optimizer to understand.  I can also say it concisely
in Drill's intermediate language.

Reply via email to