@Jacques: +1 on pretty much all you said. I, personally, will be focusing on those as soon as I'm able to get something running. @Ted: good to know there is no major sentiment against large joins, the required infrastructure for performant large joins should also allow for performant cogroups
-david On Mar 13, 2013, at 11:42 AM, Jacques Nadeau <[email protected]> wrote: > I have a feeling that large joins will be dealt with sooner rather than > later (especially with interest and work from people like you). If you > look at large queries, things are dominated by large sorts, large joins and > large group-by aggregations. We need to make sure those are performant in > large clusters before we focus on the prettier things. Hopefully we can > leverage Google Compute Engine to ensure this. > > > > On Wed, Mar 13, 2013 at 7:07 AM, David Alves <[email protected]> wrote: > >> Hi All >> >> Sorry to revive an old thread… >> I was going through the list looking for the current stance on >> joins and I found Ted's answer. >> What is the main point behind not doing large joins on Drill? >> Is it just simplicity (as in optimizer, etc.) or is there >> something else? >> I mention this because I'm particularly interested in large self >> joins (I'd can volunteer to work on them myself, of course). >> I'm not against leaving them out of any optimizer goals, if one >> can explicitly select an identity optimizer that will just follow the >> logical plan, but they are big requirement for me. >> Thoughts? >> >> Best >> David >> >> On Dec 6, 2012, at 7:33 PM, Ted Dunning <[email protected]> wrote: >> >>> Drill is explicitly designed (at this time) with the option of not doing >>> large joins. Triple stores pretty much assume lots of large joins. >>> >>> That said, if you could write some suggested typical queries, it would >> help >>> the discussion along. If you could go so far as to translate to a >> logical >>> plan, that would be even cooler. >>> >>> On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <[email protected]> wrote: >>> >>>> I would very much be interested in having a SPARQL interface, though I >> am >>>> not sure how well Drill will handle many joins. >>>> >>>> >>>> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <[email protected]> >> wrote: >>>> >>>>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <[email protected]> >>>> wrote: >>>>> >>>>>> ... >>>>>> 1 A SQL interface (in addition to DrQL interface) >>>>>> >>>>> >>>>> With your help, this may arrive before DrQL is integrated. >>>>> >>>>> >>>>>> 2 JDBC driver >>>>>> >>>>> >>>>> Should be pretty straightforward. Not on anybody's task list just >> yet, I >>>>> don't think. >>>>> >>>>> >>>>>> 3 Access to the stack at a lower level (i.e. a way to use the >>>>>> high-performance scan operators without writing a query) >>>>>> >>>>> >>>>> Definitely going to happen. >>>>> >>>>> >>>>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays >>>> of >>>>>> primitives or nio buffers) >>>>>> >>>>> >>>>> I wonder if this is just a matter of writing a special scanner or a >>>> special >>>>> flavor of join at the execution point. The scanner for the case where >>>> the >>>>> in-memory compact form is only readable in sequential form. The >>>>> join-operator if the memory can be accessed at random. >>>>> >>>>> ... >>>>>> I know some of these are outside of Drill's scope. If so, feel free to >>>>>> disregard. But if you don't ask, you don't get. :) >>>>>> >>>>> >>>>> They all look pretty reasonable to me. >>>>> >>>> >> >>
