Don't worry Tim, it is still very much on my radar. Just well ahead of the ref interpreter stuff. Let me see what I can slice up in the next few days.
J On Wed, Mar 13, 2013 at 2:40 PM, Timothy Chen <[email protected]> wrote: > Looking forward to the plumbing as well, since my json scan op sat there > for a while now :) > > Tim > > > On Wed, Mar 13, 2013 at 2:30 PM, David Alves <[email protected]> > wrote: > > > Getting the basic plumbing to a point where we could work together on > > it/use it elsewhere as soon as you can would be awesome. > > As soon as I get that I can start on the daemons/scripts. > > I'll focus on the SE iface and on HBase pushdown for the moment. > > > > -david > > > > On Mar 13, 2013, at 3:12 PM, Jacques Nadeau <[email protected]> wrote: > > > > > I'm working on some physical plan stuff as well as some basic plumbing > > for > > > distributed execution. Its very in progress so I need to clean things > > up a > > > bit before we could collaborate/ divide and conquer on it. Depending > on > > > your timing and availability, maybe I could put some of this together > in > > > the next couple days so that you could plug in rather than reinvent. > In > > > the meantime, pushing forward the builder stuff, additional test cases > on > > > the reference interpreter and/or thinking through the logical plan > > storage > > > engine pushdown/rewrite could be very useful. > > > > > > Let me know your thoughts. > > > > > > thanks, > > > Jacques > > > > > > On Wed, Mar 13, 2013 at 9:47 AM, David Alves <[email protected]> > > wrote: > > > > > >> Hi Jacques > > >> > > >> I can assign issues to me now, thanks. > > >> What you say wrt to the logical/physical/execution layers > sounds > > >> good. > > >> My main concern, for the moment is to have something working as > > >> fast as possible, i.e. some daemons that I'd be able to deploy to a > > working > > >> hbase cluster and send them work to do in some form (first step would > > be to > > >> treat is as a non distributed engine where each daemon runs an > instance > > of > > >> the prototype). > > >> Here's where I'd like to go next: > > >> - lay the ground work for the daemons (scripts/rpc iface/wiring > > >> protocol). > > >> - create an execution engine iface that allows to abstract > future > > >> implementations, and make it available through the rpc iface. this > would > > >> sit in front of the ref impl for now and would be replaced by cpp down > > the > > >> line. > > >> > > >> I think we can probably concentrate on the capabilities iface a > > >> bit down the line but, as a first approach, I see it simply providing > a > > >> simple set of ops that it is able to run internally. > > >> How to abstract locality/partitioning/schema capabilities is > till > > >> not clear to me though, thoughts? > > >> > > >> David > > >> > > >> On Mar 13, 2013, at 11:12 AM, Jacques Nadeau <[email protected]> > > wrote: > > >> > > >>> I'm working on a presentation that will better illustrate the layers. > > >>> There are actually three key plans. Thinking to date has been to > break > > >>> the plans down into logical, physical and execution. The third > hasn't > > >> been > > >>> expressed well here and is entirely an internal domain to the > execution > > >>> engine. Following some classic methods: Logical expresses what we > want > > >> to > > >>> do, Physical expresses how we want to do it (adding points of > > >>> parallelization but not specifying particular amounts of > > parallelization > > >> or > > >>> node by node assignments). The execution engine is then responsible > > for > > >>> determining the amount of parallelization of a particular plan along > > with > > >>> system load (likely leveraging Berkeley's Sparrow work), task > priority > > >> and > > >>> specific data locality information, building sub-dags to be assigned > to > > >>> individual nodes and execute the plan. > > >>> > > >>> So in the higher logical and physical levels, a single Scan and > > >> subsequent > > >>> ScanPOP should be okay... (ScanROPs have a separate problems since > > they > > >>> ignore the level of separation we're planning for the real execution > > >> layer. > > >>> This is the why the current ref impl turns a single Scan into > > potentially > > >>> a union of ScanROPs... not elegant but logically correct.) > > >>> > > >>> The capabilities interface still needs to be defined for how a > storage > > >>> engine reveals its logical capabilities and thus consumes part of the > > >> plan. > > >>> > > >>> J > > >>> > > >>> > > >>> On Tue, Mar 12, 2013 at 10:19 PM, David Alves <[email protected] > > > > >> wrote: > > >>> > > >>>> Hi Linsen > > >>>> > > >>>> Some of what you are saying like push down of ops like filter, > > >>>> projection or partial aggregation below the storage engine scanner > > >> level, > > >>>> or sub tree execution are actively being discussed in issues > DRILL-13 > > >>>> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine), > your > > >> input > > >>>> in these issues is most welcome. > > >>>> > > >>>> HBase in particular has the notion of > > >>>> enpoints/coprocessors/filters that allow pushing this down easily > > (this > > >> is > > >>>> also in line with what other parallel database over nosql > > >> implementations > > >>>> like tajo do). > > >>>> A possible approach is to have the optimizer change the order > of > > >>>> the ops to place them below the storage engine scanner and let the > SE > > >> impl > > >>>> deal with it internally. > > >>>> > > >>>> There are also some other pieces missing at the moment AFAIK, > > >> like > > >>>> a distributed metadata store, the drill daemons, wiring, etc. > > >>>> > > >>>> So in summary, you're absolutely right, and if you're > > >> particularly > > >>>> interested in the HBase SE impl (as I am, for the moment) I'd be > > >> interested > > >>>> in collaborating. > > >>>> > > >>>> Best > > >>>> David > > >>>> > > >>>> > > >>>> On Mar 12, 2013, at 11:44 PM, Lisen Mu <[email protected]> wrote: > > >>>> > > >>>>> Hi David, > > >>>>> > > >>>>> Very nice to see your effort on this. > > >>>>> > > >>>>> Hi Jacques, > > >>>>> > > >>>>> we are also extending drill prototype, to see if there is any > chance > > to > > >>>>> meet our production need. However, We find that implementing a > > >> performant > > >>>>> HBase storage engine is a not so straight-forward work, and > requires > > >> some > > >>>>> workaround. The problem is in Scan interface. > > >>>>> > > >>>>> In drill's physical plan model, ScanROP is in charge of table scan. > > >>>> Storage > > >>>>> engine provides output for a whole data source, a csv file for > > example. > > >>>>> It's sufficient for input source like plain file, but for hbase, > it's > > >> not > > >>>>> very efficient, if not impossible, to let ScanROP retrieve a whole > > >> htable > > >>>>> into drill. Storage engines like HBase should have some ablility to > > do > > >>>> part > > >>>>> of the DrQL query, like Filter, if a filter can be performed by > > >>>> specifying > > >>>>> startRowKey and endRowKey. Storage engine like mysql could do more, > > >> even > > >>>>> Join. > > >>>>> > > >>>>> Generally, it would be more clear if a ScanROP is mapped to a > sub-DAG > > >> of > > >>>>> logical plan DAG instead of a single Scan node in logical plan. If > > so, > > >>>> more > > >>>>> implementation-specific information would coupe into the plan > > >>>> optimization > > >>>>> & transformation phase. I guess that's the price to pay when > > >> optimization > > >>>>> comes, or is there other way I failed to see? > > >>>>> > > >>>>> Please correct me if anything is wrong. > > >>>>> > > >>>>> thanks, > > >>>>> > > >>>>> Lisen > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Wed, Mar 13, 2013 at 9:33 AM, David Alves < > [email protected]> > > >>>> wrote: > > >>>>> > > >>>>>> Hi Jacques > > >>>>>> > > >>>>>> I've submitted a fist pass patch to DRILL-15. > > >>>>>> I did this mostly because HBase will be my main target and > > >>>> because > > >>>>>> I wanted to get a feel of what would be a nice interface for > > DRILL-13. > > >>>> Have > > >>>>>> some thoughts that I will post soon. > > >>>>>> btw: I still can't assign issues to myself in JIRA, did you > > >>>> forget > > >>>>>> to add me as a contributor? > > >>>>>> > > >>>>>> Best > > >>>>>> David > > >>>>>> > > >>>>>> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <[email protected]> > > >> wrote: > > >>>>>> > > >>>>>>> Hey David, > > >>>>>>> > > >>>>>>> These sound good. I've add you as a contributor on jira so you > can > > >>>>>> assign > > >>>>>>> tasks to yourself. I think 45 and 46 are good places to start. > 15 > > >>>>>> depends > > >>>>>>> on 13 and working on the two hand in hand would probably be a > good > > >>>> idea. > > >>>>>>> Maybe we could do a design discussion on 15 and 13 here once you > > have > > >>>>>> some > > >>>>>>> time to focus on it. > > >>>>>>> > > >>>>>>> Jacques > > >>>>>>> > > >>>>>>> > > >>>>>>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves < > > [email protected]> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hi All > > >>>>>>>> > > >>>>>>>> I have a new academic project for which I'd like to use > drill > > >>>>>>>> since none of the other parallel database over hadoop/nosql > > >>>>>> implementations > > >>>>>>>> fit just right. > > >>>>>>>> To this goal I've been tinkering with the prototype trying > to > > >>>>>> find > > >>>>>>>> where I'd be most useful. > > >>>>>>>> > > >>>>>>>> Here's where I'd like to start, if you agree: > > >>>>>>>> - implement HBase storage engine (DRILL-15) > > >>>>>>>> - start with simple scanning an push down of > > >>>>>>>> selection/projection > > >>>>>>>> - implement the LogicalPlanBuilder (DRILL-45) > > >>>>>>>> - setup coding style in the wiki (formatting/imports etc, > > >>>>>> DRILL-46) > > >>>>>>>> - create builders for all logical plan elements/make logical > > >>>>>> plans > > >>>>>>>> immutable (no issue for this, I'd like to hear your thoughts > > first). > > >>>>>>>> > > >>>>>>>> Please let me know your thoughts, and if you agree please > > >> assign > > >>>>>>>> the issues to me (it seems that I can't assign them myself). > > >>>>>>>> > > >>>>>>>> Best > > >>>>>>>> David Alves > > >>>>>> > > >>>>>> > > >>>> > > >>>> > > >> > > >> > > > > >
