[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir Ozdemir updated PHOENIX-6458:
-----------------------------------
    Description: 
The Phoenix query optimizer does not use a global index for a query with the 
columns that are not covered by the global index if the query does not have the 
corresponding index hint for this index. With the index hint, the optimizer 
rewrites the query where the index is used within a subquery. With this 
subquery, the row keys of the index rows that satisfy the subquery are 
retrieved by the Phoenix client and then pushed into the Phoenix server caches 
of the data table regions. Finally, on the server side, data table rows are 
scanned and joined with the index rows using HashJoin. Based on the selectivity 
of the original query, this join operation may still result in scanning a large 
amount of data table rows. 


Eliminating these data table scans would be a significant improvement. To do 
that, instead of rewriting the query, the Phoenix optimizer simply treats the 
global index as a covered index for the given query. With this, the Phoenix 
query optimizer chooses the index table for the query especially when the index 
row key prefix length is greater than the data row key prefix length for the 
query. On the server side, the index table is scanned using index row key 
ranges implied by the query and the index row keys are then mapped to the data 
table row keys (please note an index row key includes all the data row key 
columns). Finally, the corresponding data table rows are scanned using 
server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table rows 
one by one using the HBase get operation. PHOENIX-6501 replaces this get 
operation with the scan operation to reduce the number of server-to-server RPC 
calls.

  was:Phoenix client does not use a global index for the queries with the 
columns that are not covered by the global index. However, there are many cases 
where using the global index to map secondary keys to primary keys and then 
retrieving the corresponding rows from the data table results in faster 
queries. It is expected that such performance improvement will happen when the 
index row key prefix length is greater than the data row key prefix length for 
a given query. 


> Using global indexes for queries with uncovered columns
> -------------------------------------------------------
>
>                 Key: PHOENIX-6458
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6458
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.1.0
>            Reporter: Kadir Ozdemir
>            Priority: Major
>         Attachments: PHOENIX-6458.master.001.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to