Hi Linsen

        Some of what you are saying like push down of ops like filter, 
projection or partial aggregation below the storage engine scanner level, or 
sub tree execution are actively being discussed in issues DRILL-13 (Strorage 
Engine Interface) and DRILL-15 (Hbase storage engine), your input in these 
issues is most welcome.

        HBase in particular has the notion of enpoints/coprocessors/filters 
that allow pushing this down easily (this is also in line with what other 
parallel database over nosql implementations like tajo do).
        A possible approach is to have the optimizer change the order of the 
ops to place them below the storage engine scanner and let the SE impl deal 
with it internally.

        There are also some other pieces missing at the moment AFAIK, like a 
distributed metadata store, the drill daemons, wiring, etc.

        So in summary, you're absolutely right, and if you're particularly 
interested in the HBase SE impl (as I am, for the moment) I'd be interested in 
collaborating.

Best
David

        
On Mar 12, 2013, at 11:44 PM, Lisen Mu <[email protected]> wrote:

> Hi David,
> 
> Very nice to see your effort on this.
> 
> Hi Jacques,
> 
> we are also extending drill prototype, to see if there is any chance to
> meet our production need. However, We find that implementing a performant
> HBase storage engine is a not so straight-forward work, and requires some
> workaround. The problem is in Scan interface.
> 
> In drill's physical plan model, ScanROP is in charge of table scan. Storage
> engine provides output for a whole data source, a csv file for example.
> It's sufficient for input source like plain file, but for hbase, it's not
> very efficient, if not impossible, to let ScanROP retrieve a whole htable
> into drill. Storage engines like HBase should have some ablility to do part
> of the DrQL query, like Filter, if a filter can be performed by specifying
> startRowKey and endRowKey. Storage engine like mysql could do more, even
> Join.
> 
> Generally, it would be more clear if a ScanROP is mapped to a sub-DAG of
> logical plan DAG instead of a single Scan node in logical plan. If so, more
> implementation-specific information would coupe into the plan optimization
> & transformation phase. I guess that's the price to pay when optimization
> comes, or is there other way I failed to see?
> 
> Please correct me if anything is wrong.
> 
> thanks,
> 
> Lisen
> 
> 
> 
> On Wed, Mar 13, 2013 at 9:33 AM, David Alves <[email protected]> wrote:
> 
>> Hi Jacques
>> 
>>        I've submitted a fist pass patch to DRILL-15.
>>        I did this mostly because HBase will be my main target and because
>> I wanted to get a feel of what would be a nice interface for DRILL-13. Have
>> some thoughts that I will post soon.
>>        btw: I still can't assign issues to myself in JIRA, did you forget
>> to add me as a contributor?
>> 
>> Best
>> David
>> 
>> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <[email protected]> wrote:
>> 
>>> Hey David,
>>> 
>>> These sound good.  I've add you as a contributor on jira so you can
>> assign
>>> tasks to yourself.  I think 45 and 46 are good places to start.  15
>> depends
>>> on 13 and working on the two hand in hand would probably be a good idea.
>>> Maybe we could do a design discussion on 15 and 13 here once you have
>> some
>>> time to focus on it.
>>> 
>>> Jacques
>>> 
>>> 
>>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves <[email protected]>
>> wrote:
>>> 
>>>> Hi All
>>>> 
>>>>       I have a new academic project for which I'd like to use drill
>>>> since none of the other parallel database over hadoop/nosql
>> implementations
>>>> fit just right.
>>>>       To this goal I've been tinkering with the prototype trying to
>> find
>>>> where I'd be most useful.
>>>> 
>>>>       Here's where I'd like to start, if you agree:
>>>>       - implement HBase storage engine (DRILL-15)
>>>>               - start with simple scanning an push down of
>>>> selection/projection
>>>>       - implement the LogicalPlanBuilder (DRILL-45)
>>>>       - setup coding style in the wiki (formatting/imports etc,
>> DRILL-46)
>>>>       - create builders for all logical plan elements/make logical
>> plans
>>>> immutable (no issue for this, I'd like to hear your thoughts first).
>>>> 
>>>>       Please let me know your thoughts, and if you agree please assign
>>>> the issues to me (it seems that I can't assign them myself).
>>>> 
>>>> Best
>>>> David Alves
>> 
>> 

Reply via email to