[
https://issues.apache.org/jira/browse/PHOENIX-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066944#comment-17066944
]
Daniel Wong commented on PHOENIX-5796:
--------------------------------------
Its not necessarily optimal.
There are a bunch of possible jiras around optimization that we could consider.
If the selectivity of the index query is high, that is we select a large
portion of the rows it will be faster to full table scan the base table.
With stats or with region boundaries we can better estimate selectivity of
predicates and thus decide if we want to join or FTS.
One easy one i want to open is small scan optimizations where for example if
the user provides a limit we do stuff like this. This exact case plus limit
limit with index join on uncovered columns, i forget if i opened a Jira
already. Using region boundaries would be possibly enough in this case as the
index is on B only, so its meaningful enough, if the index had been on C,B it
would not be sufficient to estimate selectivity and we would need stats.
> Possible query optimization when projecting uncovered columns and querying on
> indexed columns
> ---------------------------------------------------------------------------------------------
>
> Key: PHOENIX-5796
> URL: https://issues.apache.org/jira/browse/PHOENIX-5796
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.15.0
> Reporter: Chinmay Kulkarni
> Priority: Major
> Attachments: Screen Shot 2020-03-23 at 3.25.38 PM.png, Screen Shot
> 2020-03-23 at 3.32.24 PM.png, Screen Shot 2020-03-24 at 11.51.12 AM.png
>
>
> Start HBase-1.3 server with Phoenix-4.15.0-HBase-1.3 server jar. Connect to
> it using sqlline.py which has Phoenix-4.15.0-HBase-1.3 Phoenix client.
> Create a base table like:
> {code:sql}
> create table t (a integer primary key, b varchar(10), c integer);
> {code}
> Create an uncovered index on top of it like:
> {code:sql}
> create index uncov_index_t on t(b);
> {code}
> Now if you issue the query:
> {code:sql}
> explain select c from t where b='abc';
> {code}
> You'd see the following explain plan:
> !Screen Shot 2020-03-23 at 3.25.38 PM.png|height=150,width=700!
> *Which is a full table scan on the base table 't'* since we cannot use the
> global index as 'c' is not a covered column in the global index.
> *However, projecting columns contained fully within the index pk is correctly
> a range scan:*
> {code:sql}
> explain select a,b from t where b='abc';
> {code}
> produces the following explain plan:
> !Screen Shot 2020-03-23 at 3.32.24 PM.png|height=150,width=700!
> In the first query, can there be an optimization to *query the index table,
> get the start and stop keys of the base table and then issue a range
> scan/(bunch of point lookups) on the base table* instead of doing a full
> table scan on the base table like we currently do?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)