Chinmay Kulkarni commented on PHOENIX-5796:

[~dbwong] [~comnetwork] Thanks for looking through the code and analyzing this 
in more detail! Makes sense. So it looks like the behavior after you provide an 
index hint is optimal. IMO, this should be the default behavior if such an 
index exists (even when an index hint is not provided), so there is still room 
for improvement in our query optimizer. 

Also, perhaps the index hint query should not show up as a FULL TABLE SCAN in 
the explain plan. We can improve the explain plan to indicate the semi join, or 
at the very least, add more documentation around understanding a 
full-table-scan in the context of a "SKIP-SCAN-JOIN" on the website for users.

Overall, it looks to me that the improvement is still a valid one, though 
luckily it is already done when you provide the index hint, WDYT [~dbwong] 

> Possible query optimization when projecting uncovered columns and querying on 
> indexed columns
> ---------------------------------------------------------------------------------------------
>                 Key: PHOENIX-5796
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5796
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.15.0
>            Reporter: Chinmay Kulkarni
>            Priority: Major
>         Attachments: Screen Shot 2020-03-23 at 3.25.38 PM.png, Screen Shot 
> 2020-03-23 at 3.32.24 PM.png, Screen Shot 2020-03-24 at 11.51.12 AM.png
> Start HBase-1.3 server with Phoenix-4.15.0-HBase-1.3 server jar. Connect to 
> it using sqlline.py which has Phoenix-4.15.0-HBase-1.3 Phoenix client.
> Create a base table like:
> {code:sql}
> create table t (a integer primary key, b varchar(10), c integer);
> {code}
> Create an uncovered index on top of it like:
> {code:sql}
> create index uncov_index_t on t(b);
> {code}
> Now if you issue the query:
> {code:sql}
> explain select c from t where b='abc';
> {code}
> You'd see the following explain plan:
>  !Screen Shot 2020-03-23 at 3.25.38 PM.png|height=150,width=700!
> *Which is a full table scan on the base table 't'* since we cannot use the 
> global index as 'c' is not a covered column in the global index.
> *However, projecting columns contained fully within the index pk is correctly 
> a range scan:*
> {code:sql}
> explain select a,b from t where b='abc';
> {code}
> produces the following explain plan:
>  !Screen Shot 2020-03-23 at 3.32.24 PM.png|height=150,width=700! 
> In the first query, can there be an optimization to *query the index table, 
> get the start and stop keys of the base table and then issue a range 
> scan/(bunch of point lookups) on the base table* instead of doing a full 
> table scan on the base table like we currently do?

This message was sent by Atlassian Jira

Reply via email to