I know somebody who is querying a very large table and has trouble with
pushdown.

They are looking for values indexed by primary key with a query like
"select * from table where key in s".  If s has a very small number of
values, this turns into primary key access, but if there are more than just
a few, it becomes a scan.

The situation that would be interesting to detect is where s has a few
tightly clustered groups. The ideal strategy would be to scan each group.
How this might be detected isn't clear to me, but it would make a massive
difference to this kind of query.

Currently, the best alternative is to try to avoid this kind of query and
build a data flow such that each cluster of keys flows into a separate
query. This would be made easier if a common table expression (CTE) query
could be done without having the optimizer try to globally optimize back to
a single big scan.

Anyway, I have absolutely no concrete suggestions for making this work, but
the need is there.


On Tue, Aug 24, 2021 at 4:39 AM luoc <[email protected]> wrote:

> Hello Guys,
>   Will you use Drill to query Apache HBase? If so, what new feature would
> you like to see in HBase storage plugin? In addition, Drill supported the
> Apache Cassandra since 1.19.
> Absolutely… Could you tell me what your most common storage plugin (or
> data format) are? Thanks for your time.
>
>
> -- luoc

Reply via email to