Chinmay Kulkarni commented on PHOENIX-5796:

[~dbwong] Agreed that in general, we can use stats for selecting more optimal 
queries. However, if stats is turned off and we have a scenario such as this 
particular example, more often than not a range scan on the global index 
followed by a range scan on the base table may be more optimal than a full 
table scan on the base table (would need stats to be sure) in normal production 
use-cases, given a good indexing scheme.
This is debatable and possibly a config that a client would want to be able to 
toggle. This is possibly why we never do this by default in the first case.

Can you point me towards that Jira you opened? I'd like to track such 
stats-based query optimization Jiras since such micro-improvements can 
contribute a lot to the overall query time.

> Possible query optimization when projecting uncovered columns and querying on 
> indexed columns
> ---------------------------------------------------------------------------------------------
>                 Key: PHOENIX-5796
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5796
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.15.0
>            Reporter: Chinmay Kulkarni
>            Priority: Major
>              Labels: query-optimization
>         Attachments: Screen Shot 2020-03-23 at 3.25.38 PM.png, Screen Shot 
> 2020-03-23 at 3.32.24 PM.png, Screen Shot 2020-03-24 at 11.51.12 AM.png
> Start HBase-1.3 server with Phoenix-4.15.0-HBase-1.3 server jar. Connect to 
> it using sqlline.py which has Phoenix-4.15.0-HBase-1.3 Phoenix client.
> Create a base table like:
> {code:sql}
> create table t (a integer primary key, b varchar(10), c integer);
> {code}
> Create an uncovered index on top of it like:
> {code:sql}
> create index uncov_index_t on t(b);
> {code}
> Now if you issue the query:
> {code:sql}
> explain select c from t where b='abc';
> {code}
> You'd see the following explain plan:
>  !Screen Shot 2020-03-23 at 3.25.38 PM.png|height=150,width=700!
> *Which is a full table scan on the base table 't'* since we cannot use the 
> global index as 'c' is not a covered column in the global index.
> *However, projecting columns contained fully within the index pk is correctly 
> a range scan:*
> {code:sql}
> explain select a,b from t where b='abc';
> {code}
> produces the following explain plan:
>  !Screen Shot 2020-03-23 at 3.32.24 PM.png|height=150,width=700! 
> In the first query, can there be an optimization to *query the index table, 
> get the start and stop keys of the base table and then issue a range 
> scan/(bunch of point lookups) on the base table* instead of doing a full 
> table scan on the base table like we currently do?

This message was sent by Atlassian Jira

Reply via email to