David, Thanks, I'm willing to help.
Sorry I missed the conclusion in jira. Thanks for the explanation, I guess further push from you and Jacques would make things clearer. On Mon, Apr 22, 2013 at 12:47 PM, David Alves <[email protected]> wrote: > Lisen > > Ah, got what you mean by encoding mutliple fields into rowkey. > Well that makes projection trickier, but still definitely possible > to do with Filters. > As soon as I get something reasonable working I'll push it and I > welcome your help in dealing with that particular situation and any others > you can come up with. > > With regard to pushdown after a bit of the discussion in the SE > jira (I forget the number) the consensus seems to be that the SE advertises > opaque OptimizerRules that the optimizer runs. > These can for instance, push the project in Jacques example inside > the scan, or change the order of ops. > In general I can see the case where a typical RDBMS would publish > multiple rules (for agg, proj, select, even join) which, when run by the > optimizer would go through the ops directly above the scan and keep pushing > most inside the scan until there is either nothing left but the sink and > the scan (and not even the sink if it goes into the same data source) or > there's a multi-branch multi-data source op such as union or join. > All of there are inside the Scan physical op (and are SE agnostic > up to this point). > So the physical plan portion to be executed by the SE is actually > inside the scan op. > At least this is how I'm thinking about it right now… > > Best > David > > > On Apr 21, 2013, at 11:29 PM, Lisen Mu <[email protected]> wrote: > > > David, > > > > Suppose we have planned to use domainId+uid+timestamp as my HTable > rowkey. > > > > I wish to retrieve uid portion from my rowkey, like: > > > > SELECT distinct(uid) from `my_table` where xxx > > > > Or, I wish I can do: > > > > a) SELECT xxx from `my_table` where domainId='a' > > b) SELECT xxx from `my_table` where uid='[email protected]' > > > > And HBase SE would determine the best startKey and endKey according to > > rowkey definition info, so a) and b) would get different performance. > > > >> about selection/Filter & aggregation: > > > > I have too many questions that I feel it be better to wait your HBase SE > > first... However: > > > > How to push down aggregation and selection into scan pop? > > > > @Jacques, It seems to me that your idea is to use a scan pop node to > > describe what SE would do in a query, right? > > > > Would scan pop become a little too complicated if scan pop stay SE > > independent? Since mysql & mongo need more for scan pop. > > > > Previously I thought you would provide something like > > > > RecordReader getReader(PhysicalPlan subPlan) > > > > SE advertises ability back to drill, drill push part of physical plan to > SE > > and let SE figure out how to deal with the subdag as long as SE can > provide > > correct RecordBatch. > > > > > > > > > > > > On Mon, Apr 22, 2013 at 12:06 PM, David Alves <[email protected]> > wrote: > > > >> Hi Lisen > >> > >> Phoenix has been a good source of inspiration. > >> Had it not been for license issues (non-standard license) and the > >> fact it is designed to run locally I would have used it directly > instead of > >> coding my own. > >> Not completely sure what you mean wrt to "map fields in the query > >> into portion of rowkey in HBase" but here's what I'm doing with regard > to > >> the operations that are pushed to HBase: > >> > >> Projection comes from setting the interesting CF's and CQ's in > the > >> Scan prior to starting it (where those come from in drill was the reason > >> for my previous email). > >> Selection comes from setting Filters that are created directly > >> form expresssions in drlll and are submitted with the scan. > >> Partial Aggregation (which I'm not doing right now but will do > >> soon ) will come from co-processors. > >> Joins: I'm investigating a couple on pushing some of the work to > >> hbase. > >> > >> All the remaining operations will happen within drill itself. > >> > >> Best > >> David > >> > >> On Apr 21, 2013, at 10:45 PM, Lisen Mu <[email protected]> wrote: > >> > >>> David, > >>> > >>> Another case about schema: how to map fields in the query into portion > of > >>> rowkey in HBase? Like phoenix does. > >>> http://files.meetup.com/1350427/IntelPhoenixHBaseMeetup.ppt > >>> > >>> I think it might be common in HBase schema design that several logical > >>> parts form rowkey in a particular order for the most frequent access > >>> pattern. > >>> > >>> > >>> > >>> > >>> On Sun, Apr 21, 2013 at 1:45 PM, David Alves <[email protected]> > >> wrote: > >>> > >>>> had a "duh" moment, realizing that, of course, I don't need a > >>>> ProjectFilter as I can set the relevant cq's and cf's on HBase's Scan. > >>>> the question or how to get the names of the columns the query is > asking > >>>> for or even "*" if that is the case, still stands though… > >>>> > >>>> -david > >>>> > >>>> On Apr 20, 2013, at 10:39 PM, David Alves <[email protected]> > >> wrote: > >>>> > >>>>> Hi Jacques > >>>>> > >>>>> I'm implementing a ProjectFilter for HBase and I got to the point > >>>> where I need to pass to HBase the fields that are required (even if > it's > >>>> simply "all" as in *). > >>>>> How to know which fields to scan in the SE and their expected > >> type? > >>>>> There's a bunch of schema stuff in the > >>>> org/apache/drill/exec/schema but I can't figure how SE uses that. > >>>>> Will this info come inside the scan logical op in > >>>> getReadEntries(Scan scan) (in the arbitrary "selection" section)? > >>>>> Is this method still going to receive a logical Scan op or is > this > >>>> just a legacy stuff that you didn't have the chance to get to yet? > >>>>> BatchSchema seems to only refer to field ids… > >>>>> > >>>>> I'm thinking this is most likely because the work is still very > >>>> much in progress but as I browse the code I can see you have put a lot > >> of > >>>> thought into almost everything even when it's not being used right now > >> and > >>>> I don't want to make any stupid assumption. > >>>>> I can definitely make that info get to the SE iface myself just > >>>> wondering how do you envision it should get there… > >>>>> > >>>>> Best > >>>>> David > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >> > >> > >
