Getting the basic plumbing to a point where we could work together on it/use it elsewhere as soon as you can would be awesome. As soon as I get that I can start on the daemons/scripts. I'll focus on the SE iface and on HBase pushdown for the moment.
-david On Mar 13, 2013, at 3:12 PM, Jacques Nadeau <[email protected]> wrote: > I'm working on some physical plan stuff as well as some basic plumbing for > distributed execution. Its very in progress so I need to clean things up a > bit before we could collaborate/ divide and conquer on it. Depending on > your timing and availability, maybe I could put some of this together in > the next couple days so that you could plug in rather than reinvent. In > the meantime, pushing forward the builder stuff, additional test cases on > the reference interpreter and/or thinking through the logical plan storage > engine pushdown/rewrite could be very useful. > > Let me know your thoughts. > > thanks, > Jacques > > On Wed, Mar 13, 2013 at 9:47 AM, David Alves <[email protected]> wrote: > >> Hi Jacques >> >> I can assign issues to me now, thanks. >> What you say wrt to the logical/physical/execution layers sounds >> good. >> My main concern, for the moment is to have something working as >> fast as possible, i.e. some daemons that I'd be able to deploy to a working >> hbase cluster and send them work to do in some form (first step would be to >> treat is as a non distributed engine where each daemon runs an instance of >> the prototype). >> Here's where I'd like to go next: >> - lay the ground work for the daemons (scripts/rpc iface/wiring >> protocol). >> - create an execution engine iface that allows to abstract future >> implementations, and make it available through the rpc iface. this would >> sit in front of the ref impl for now and would be replaced by cpp down the >> line. >> >> I think we can probably concentrate on the capabilities iface a >> bit down the line but, as a first approach, I see it simply providing a >> simple set of ops that it is able to run internally. >> How to abstract locality/partitioning/schema capabilities is till >> not clear to me though, thoughts? >> >> David >> >> On Mar 13, 2013, at 11:12 AM, Jacques Nadeau <[email protected]> wrote: >> >>> I'm working on a presentation that will better illustrate the layers. >>> There are actually three key plans. Thinking to date has been to break >>> the plans down into logical, physical and execution. The third hasn't >> been >>> expressed well here and is entirely an internal domain to the execution >>> engine. Following some classic methods: Logical expresses what we want >> to >>> do, Physical expresses how we want to do it (adding points of >>> parallelization but not specifying particular amounts of parallelization >> or >>> node by node assignments). The execution engine is then responsible for >>> determining the amount of parallelization of a particular plan along with >>> system load (likely leveraging Berkeley's Sparrow work), task priority >> and >>> specific data locality information, building sub-dags to be assigned to >>> individual nodes and execute the plan. >>> >>> So in the higher logical and physical levels, a single Scan and >> subsequent >>> ScanPOP should be okay... (ScanROPs have a separate problems since they >>> ignore the level of separation we're planning for the real execution >> layer. >>> This is the why the current ref impl turns a single Scan into potentially >>> a union of ScanROPs... not elegant but logically correct.) >>> >>> The capabilities interface still needs to be defined for how a storage >>> engine reveals its logical capabilities and thus consumes part of the >> plan. >>> >>> J >>> >>> >>> On Tue, Mar 12, 2013 at 10:19 PM, David Alves <[email protected]> >> wrote: >>> >>>> Hi Linsen >>>> >>>> Some of what you are saying like push down of ops like filter, >>>> projection or partial aggregation below the storage engine scanner >> level, >>>> or sub tree execution are actively being discussed in issues DRILL-13 >>>> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine), your >> input >>>> in these issues is most welcome. >>>> >>>> HBase in particular has the notion of >>>> enpoints/coprocessors/filters that allow pushing this down easily (this >> is >>>> also in line with what other parallel database over nosql >> implementations >>>> like tajo do). >>>> A possible approach is to have the optimizer change the order of >>>> the ops to place them below the storage engine scanner and let the SE >> impl >>>> deal with it internally. >>>> >>>> There are also some other pieces missing at the moment AFAIK, >> like >>>> a distributed metadata store, the drill daemons, wiring, etc. >>>> >>>> So in summary, you're absolutely right, and if you're >> particularly >>>> interested in the HBase SE impl (as I am, for the moment) I'd be >> interested >>>> in collaborating. >>>> >>>> Best >>>> David >>>> >>>> >>>> On Mar 12, 2013, at 11:44 PM, Lisen Mu <[email protected]> wrote: >>>> >>>>> Hi David, >>>>> >>>>> Very nice to see your effort on this. >>>>> >>>>> Hi Jacques, >>>>> >>>>> we are also extending drill prototype, to see if there is any chance to >>>>> meet our production need. However, We find that implementing a >> performant >>>>> HBase storage engine is a not so straight-forward work, and requires >> some >>>>> workaround. The problem is in Scan interface. >>>>> >>>>> In drill's physical plan model, ScanROP is in charge of table scan. >>>> Storage >>>>> engine provides output for a whole data source, a csv file for example. >>>>> It's sufficient for input source like plain file, but for hbase, it's >> not >>>>> very efficient, if not impossible, to let ScanROP retrieve a whole >> htable >>>>> into drill. Storage engines like HBase should have some ablility to do >>>> part >>>>> of the DrQL query, like Filter, if a filter can be performed by >>>> specifying >>>>> startRowKey and endRowKey. Storage engine like mysql could do more, >> even >>>>> Join. >>>>> >>>>> Generally, it would be more clear if a ScanROP is mapped to a sub-DAG >> of >>>>> logical plan DAG instead of a single Scan node in logical plan. If so, >>>> more >>>>> implementation-specific information would coupe into the plan >>>> optimization >>>>> & transformation phase. I guess that's the price to pay when >> optimization >>>>> comes, or is there other way I failed to see? >>>>> >>>>> Please correct me if anything is wrong. >>>>> >>>>> thanks, >>>>> >>>>> Lisen >>>>> >>>>> >>>>> >>>>> On Wed, Mar 13, 2013 at 9:33 AM, David Alves <[email protected]> >>>> wrote: >>>>> >>>>>> Hi Jacques >>>>>> >>>>>> I've submitted a fist pass patch to DRILL-15. >>>>>> I did this mostly because HBase will be my main target and >>>> because >>>>>> I wanted to get a feel of what would be a nice interface for DRILL-13. >>>> Have >>>>>> some thoughts that I will post soon. >>>>>> btw: I still can't assign issues to myself in JIRA, did you >>>> forget >>>>>> to add me as a contributor? >>>>>> >>>>>> Best >>>>>> David >>>>>> >>>>>> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <[email protected]> >> wrote: >>>>>> >>>>>>> Hey David, >>>>>>> >>>>>>> These sound good. I've add you as a contributor on jira so you can >>>>>> assign >>>>>>> tasks to yourself. I think 45 and 46 are good places to start. 15 >>>>>> depends >>>>>>> on 13 and working on the two hand in hand would probably be a good >>>> idea. >>>>>>> Maybe we could do a design discussion on 15 and 13 here once you have >>>>>> some >>>>>>> time to focus on it. >>>>>>> >>>>>>> Jacques >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>>> Hi All >>>>>>>> >>>>>>>> I have a new academic project for which I'd like to use drill >>>>>>>> since none of the other parallel database over hadoop/nosql >>>>>> implementations >>>>>>>> fit just right. >>>>>>>> To this goal I've been tinkering with the prototype trying to >>>>>> find >>>>>>>> where I'd be most useful. >>>>>>>> >>>>>>>> Here's where I'd like to start, if you agree: >>>>>>>> - implement HBase storage engine (DRILL-15) >>>>>>>> - start with simple scanning an push down of >>>>>>>> selection/projection >>>>>>>> - implement the LogicalPlanBuilder (DRILL-45) >>>>>>>> - setup coding style in the wiki (formatting/imports etc, >>>>>> DRILL-46) >>>>>>>> - create builders for all logical plan elements/make logical >>>>>> plans >>>>>>>> immutable (no issue for this, I'd like to hear your thoughts first). >>>>>>>> >>>>>>>> Please let me know your thoughts, and if you agree please >> assign >>>>>>>> the issues to me (it seems that I can't assign them myself). >>>>>>>> >>>>>>>> Best >>>>>>>> David Alves >>>>>> >>>>>> >>>> >>>> >> >>
