Re: Query the HBase data in Drill

Charles Givre Wed, 25 Aug 2021 08:23:18 -0700

Hi Luoc, 
There actually is a pending PR which we should merge before we do any 
additional work to the HBase plugin which is DRILL-7985 [1].   This PR 
introduces a new framework for pushdowns which will make it a lot easier to 
implement pushdowns for the various storage plugins.  I would recommend reading 
the docs for that as we can really make the HBase plugin a lot more robust than 
it currently is.


Best,
-- C 

1: https://github.com/apache/drill/pull/2289 
<https://github.com/apache/drill/pull/2289>




> On Aug 25, 2021, at 10:43 AM, luoc <[email protected]> wrote:
> 
>  Thanks for the feedback. Apache HBase and Apache Phoenix are an important 
> part of my work. And then, I'm not sure anyone have started the `HBase to 
> EVF` for Drill, but this improvement is valuable.
>  In particular, I found a big improvement over the Phoenix 4.x and HBase 1.x 
> series when I recently used the Phoenix 5.1 + HBase 2.3 on Hadoop 3.3.
>  Look forward to seeing Drill inherit from these advantages.
> 
>> 在 2021年8月24日，23:16，Ted Dunning <[email protected]> 写道：
>> 
>> I know somebody who is querying a very large table and has trouble with
>> pushdown.
>> 
>> They are looking for values indexed by primary key with a query like
>> "select * from table where key in s".  If s has a very small number of
>> values, this turns into primary key access, but if there are more than just
>> a few, it becomes a scan.
>> 
>> The situation that would be interesting to detect is where s has a few
>> tightly clustered groups. The ideal strategy would be to scan each group.
>> How this might be detected isn't clear to me, but it would make a massive
>> difference to this kind of query.
>> 
>> Currently, the best alternative is to try to avoid this kind of query and
>> build a data flow such that each cluster of keys flows into a separate
>> query. This would be made easier if a common table expression (CTE) query
>> could be done without having the optimizer try to globally optimize back to
>> a single big scan.
>> 
>> Anyway, I have absolutely no concrete suggestions for making this work, but
>> the need is there.
>> 
>> 
>>> On Tue, Aug 24, 2021 at 4:39 AM luoc <[email protected]> wrote:
>>> 
>>> Hello Guys,
>>> Will you use Drill to query Apache HBase? If so, what new feature would
>>> you like to see in HBase storage plugin? In addition, Drill supported the
>>> Apache Cassandra since 1.19.
>>> Absolutely… Could you tell me what your most common storage plugin (or
>>> data format) are? Thanks for your time.
>>> 
>>> 
>>> -- luoc
>

Re: Query the HBase data in Drill

Reply via email to