Hi Luoc, There actually is a pending PR which we should merge before we do any additional work to the HBase plugin which is DRILL-7985 [1]. This PR introduces a new framework for pushdowns which will make it a lot easier to implement pushdowns for the various storage plugins. I would recommend reading the docs for that as we can really make the HBase plugin a lot more robust than it currently is.
Best, -- C 1: https://github.com/apache/drill/pull/2289 <https://github.com/apache/drill/pull/2289> > On Aug 25, 2021, at 10:43 AM, luoc <[email protected]> wrote: > > Thanks for the feedback. Apache HBase and Apache Phoenix are an important > part of my work. And then, I'm not sure anyone have started the `HBase to > EVF` for Drill, but this improvement is valuable. > In particular, I found a big improvement over the Phoenix 4.x and HBase 1.x > series when I recently used the Phoenix 5.1 + HBase 2.3 on Hadoop 3.3. > Look forward to seeing Drill inherit from these advantages. > >> 在 2021年8月24日,23:16,Ted Dunning <[email protected]> 写道: >> >> I know somebody who is querying a very large table and has trouble with >> pushdown. >> >> They are looking for values indexed by primary key with a query like >> "select * from table where key in s". If s has a very small number of >> values, this turns into primary key access, but if there are more than just >> a few, it becomes a scan. >> >> The situation that would be interesting to detect is where s has a few >> tightly clustered groups. The ideal strategy would be to scan each group. >> How this might be detected isn't clear to me, but it would make a massive >> difference to this kind of query. >> >> Currently, the best alternative is to try to avoid this kind of query and >> build a data flow such that each cluster of keys flows into a separate >> query. This would be made easier if a common table expression (CTE) query >> could be done without having the optimizer try to globally optimize back to >> a single big scan. >> >> Anyway, I have absolutely no concrete suggestions for making this work, but >> the need is there. >> >> >>> On Tue, Aug 24, 2021 at 4:39 AM luoc <[email protected]> wrote: >>> >>> Hello Guys, >>> Will you use Drill to query Apache HBase? If so, what new feature would >>> you like to see in HBase storage plugin? In addition, Drill supported the >>> Apache Cassandra since 1.19. >>> Absolutely… Could you tell me what your most common storage plugin (or >>> data format) are? Thanks for your time. >>> >>> >>> -- luoc >
