Re: What do you want out of Apache Drill?

David Alves Wed, 13 Mar 2013 10:06:49 -0700

@Jacques: +1 on pretty much all you said. I, personally, will be focusing on 
those as soon as I'm able to get something running.
@Ted: good to know there is no major sentiment against large joins, the 
required infrastructure for performant large joins should also allow for 
performant cogroups


-david

On Mar 13, 2013, at 11:42 AM, Jacques Nadeau <[email protected]> wrote:

> I have a feeling that large joins will be dealt with sooner rather than
> later (especially with interest and work from people like you).  If you
> look at large queries, things are dominated by large sorts, large joins and
> large group-by aggregations.  We need to make sure those are performant in
> large clusters before we focus on the prettier things.  Hopefully we can
> leverage Google Compute Engine to ensure this.
> 
> 
> 
> On Wed, Mar 13, 2013 at 7:07 AM, David Alves <[email protected]> wrote:
> 
>> Hi All
>> 
>>        Sorry to revive an old thread…
>>        I was going through the list looking for the current stance on
>> joins and I found Ted's answer.
>>        What is the main point behind not doing large joins on Drill?
>>        Is it just simplicity (as in optimizer, etc.) or is there
>> something else?
>>        I mention this because I'm particularly interested in large self
>> joins (I'd can volunteer to work on them myself, of course).
>>        I'm not against leaving them out of any optimizer goals, if one
>> can explicitly select an identity optimizer that will just follow the
>> logical plan, but they are big requirement for me.
>>        Thoughts?
>> 
>> Best
>> David
>> 
>> On Dec 6, 2012, at 7:33 PM, Ted Dunning <[email protected]> wrote:
>> 
>>> Drill is explicitly designed (at this time) with the option of not doing
>>> large joins.  Triple stores pretty much  assume lots of large joins.
>>> 
>>> That said, if you could write some suggested typical queries, it would
>> help
>>> the discussion along.  If you could go so far as to translate to a
>> logical
>>> plan, that would be even cooler.
>>> 
>>> On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <[email protected]> wrote:
>>> 
>>>> I would very much be interested in having a SPARQL interface, though I
>> am
>>>> not sure how well Drill will handle many joins.
>>>> 
>>>> 
>>>> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <[email protected]>
>> wrote:
>>>> 
>>>>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> ...
>>>>>> 1 A SQL interface (in addition to DrQL interface)
>>>>>> 
>>>>> 
>>>>> With your help, this may arrive before DrQL is integrated.
>>>>> 
>>>>> 
>>>>>> 2 JDBC driver
>>>>>> 
>>>>> 
>>>>> Should be pretty straightforward.  Not on anybody's task list just
>> yet, I
>>>>> don't think.
>>>>> 
>>>>> 
>>>>>> 3 Access to the stack at a lower level (i.e. a way to use the
>>>>>> high-performance scan operators without writing a query)
>>>>>> 
>>>>> 
>>>>> Definitely going to happen.
>>>>> 
>>>>> 
>>>>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
>>>> of
>>>>>> primitives or nio buffers)
>>>>>> 
>>>>> 
>>>>> I wonder if this is just a matter of writing a special scanner or a
>>>> special
>>>>> flavor of join at the execution point.  The scanner for the case where
>>>> the
>>>>> in-memory compact form is only readable in sequential form. The
>>>>> join-operator if the memory can be accessed at random.
>>>>> 
>>>>> ...
>>>>>> I know some of these are outside of Drill's scope. If so, feel free to
>>>>>> disregard. But if you don't ask, you don't get. :)
>>>>>> 
>>>>> 
>>>>> They all look pretty reasonable to me.
>>>>> 
>>>> 
>> 
>>

Re: What do you want out of Apache Drill?

Reply via email to