Hi All
Sorry to revive an old thread…
I was going through the list looking for the current stance on joins
and I found Ted's answer.
What is the main point behind not doing large joins on Drill?
Is it just simplicity (as in optimizer, etc.) or is there something
else?
I mention this because I'm particularly interested in large self joins
(I'd can volunteer to work on them myself, of course).
I'm not against leaving them out of any optimizer goals, if one can
explicitly select an identity optimizer that will just follow the logical plan,
but they are big requirement for me.
Thoughts?
Best
David
On Dec 6, 2012, at 7:33 PM, Ted Dunning <[email protected]> wrote:
> Drill is explicitly designed (at this time) with the option of not doing
> large joins. Triple stores pretty much assume lots of large joins.
>
> That said, if you could write some suggested typical queries, it would help
> the discussion along. If you could go so far as to translate to a logical
> plan, that would be even cooler.
>
> On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <[email protected]> wrote:
>
>> I would very much be interested in having a SPARQL interface, though I am
>> not sure how well Drill will handle many joins.
>>
>>
>> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <[email protected]> wrote:
>>
>>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <[email protected]>
>> wrote:
>>>
>>>> ...
>>>> 1 A SQL interface (in addition to DrQL interface)
>>>>
>>>
>>> With your help, this may arrive before DrQL is integrated.
>>>
>>>
>>>> 2 JDBC driver
>>>>
>>>
>>> Should be pretty straightforward. Not on anybody's task list just yet, I
>>> don't think.
>>>
>>>
>>>> 3 Access to the stack at a lower level (i.e. a way to use the
>>>> high-performance scan operators without writing a query)
>>>>
>>>
>>> Definitely going to happen.
>>>
>>>
>>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
>> of
>>>> primitives or nio buffers)
>>>>
>>>
>>> I wonder if this is just a matter of writing a special scanner or a
>> special
>>> flavor of join at the execution point. The scanner for the case where
>> the
>>> in-memory compact form is only readable in sequential form. The
>>> join-operator if the memory can be accessed at random.
>>>
>>> ...
>>>> I know some of these are outside of Drill's scope. If so, feel free to
>>>> disregard. But if you don't ask, you don't get. :)
>>>>
>>>
>>> They all look pretty reasonable to me.
>>>
>>