David, Suppose we have planned to use domainId+uid+timestamp as my HTable rowkey.
I wish to retrieve uid portion from my rowkey, like: SELECT distinct(uid) from `my_table` where xxx Or, I wish I can do: a) SELECT xxx from `my_table` where domainId='a' b) SELECT xxx from `my_table` where uid='[email protected]' And HBase SE would determine the best startKey and endKey according to rowkey definition info, so a) and b) would get different performance. > about selection/Filter & aggregation: I have too many questions that I feel it be better to wait your HBase SE first... However: How to push down aggregation and selection into scan pop? @Jacques, It seems to me that your idea is to use a scan pop node to describe what SE would do in a query, right? Would scan pop become a little too complicated if scan pop stay SE independent? Since mysql & mongo need more for scan pop. Previously I thought you would provide something like RecordReader getReader(PhysicalPlan subPlan) SE advertises ability back to drill, drill push part of physical plan to SE and let SE figure out how to deal with the subdag as long as SE can provide correct RecordBatch. On Mon, Apr 22, 2013 at 12:06 PM, David Alves <[email protected]> wrote: > Hi Lisen > > Phoenix has been a good source of inspiration. > Had it not been for license issues (non-standard license) and the > fact it is designed to run locally I would have used it directly instead of > coding my own. > Not completely sure what you mean wrt to "map fields in the query > into portion of rowkey in HBase" but here's what I'm doing with regard to > the operations that are pushed to HBase: > > Projection comes from setting the interesting CF's and CQ's in the > Scan prior to starting it (where those come from in drill was the reason > for my previous email). > Selection comes from setting Filters that are created directly > form expresssions in drlll and are submitted with the scan. > Partial Aggregation (which I'm not doing right now but will do > soon ) will come from co-processors. > Joins: I'm investigating a couple on pushing some of the work to > hbase. > > All the remaining operations will happen within drill itself. > > Best > David > > On Apr 21, 2013, at 10:45 PM, Lisen Mu <[email protected]> wrote: > > > David, > > > > Another case about schema: how to map fields in the query into portion of > > rowkey in HBase? Like phoenix does. > > http://files.meetup.com/1350427/IntelPhoenixHBaseMeetup.ppt > > > > I think it might be common in HBase schema design that several logical > > parts form rowkey in a particular order for the most frequent access > > pattern. > > > > > > > > > > On Sun, Apr 21, 2013 at 1:45 PM, David Alves <[email protected]> > wrote: > > > >> had a "duh" moment, realizing that, of course, I don't need a > >> ProjectFilter as I can set the relevant cq's and cf's on HBase's Scan. > >> the question or how to get the names of the columns the query is asking > >> for or even "*" if that is the case, still stands though… > >> > >> -david > >> > >> On Apr 20, 2013, at 10:39 PM, David Alves <[email protected]> > wrote: > >> > >>> Hi Jacques > >>> > >>> I'm implementing a ProjectFilter for HBase and I got to the point > >> where I need to pass to HBase the fields that are required (even if it's > >> simply "all" as in *). > >>> How to know which fields to scan in the SE and their expected > type? > >>> There's a bunch of schema stuff in the > >> org/apache/drill/exec/schema but I can't figure how SE uses that. > >>> Will this info come inside the scan logical op in > >> getReadEntries(Scan scan) (in the arbitrary "selection" section)? > >>> Is this method still going to receive a logical Scan op or is this > >> just a legacy stuff that you didn't have the chance to get to yet? > >>> BatchSchema seems to only refer to field ids… > >>> > >>> I'm thinking this is most likely because the work is still very > >> much in progress but as I browse the code I can see you have put a lot > of > >> thought into almost everything even when it's not being used right now > and > >> I don't want to make any stupid assumption. > >>> I can definitely make that info get to the SE iface myself just > >> wondering how do you envision it should get there… > >>> > >>> Best > >>> David > >>> > >>> > >>> > >>> > >> > >> > >
