[
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kadir Ozdemir updated PHOENIX-6458:
-----------------------------------
Description:
The Phoenix query optimizer does not use a global index for a query with the
columns that are not covered by the global index if the query does not have the
corresponding index hint for this index. With the index hint, the optimizer
rewrites the query where the index is used within a subquery. With this
subquery, the row keys of the index rows that satisfy the subquery are
retrieved by the Phoenix client and then pushed into the Phoenix server caches
of the data table regions. Finally, on the server side, data table rows are
scanned and joined with the index rows using HashJoin. Based on the selectivity
of the original query, this join operation may still result in scanning a large
amount of data table rows.
Eliminating these data table scans would be a significant improvement. To do
that, instead of rewriting the query, the Phoenix optimizer simply treats the
global index as a covered index for the given query. With this, the Phoenix
query optimizer chooses the index table for the query especially when the index
row key prefix length is greater than the data row key prefix length for the
query. On the server side, the index table is scanned using index row key
ranges implied by the query and the index row keys are then mapped to the data
table row keys (please note an index row key includes all the data row key
columns). Finally, the corresponding data table rows are scanned using
server-to-server RPCs. PHOENIX-6458 (this Jira) retrieves the data table rows
one by one using the HBase get operation. PHOENIX-6501 replaces this get
operation with the scan operation to reduce the number of server-to-server RPC
calls.
was:Phoenix client does not use a global index for the queries with the
columns that are not covered by the global index. However, there are many cases
where using the global index to map secondary keys to primary keys and then
retrieving the corresponding rows from the data table results in faster
queries. It is expected that such performance improvement will happen when the
index row key prefix length is greater than the data row key prefix length for
a given query.
> Using global indexes for queries with uncovered columns
> -------------------------------------------------------
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.1.0
> Reporter: Kadir Ozdemir
> Priority: Major
> Attachments: PHOENIX-6458.master.001.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the
> columns that are not covered by the global index if the query does not have
> the corresponding index hint for this index. With the index hint, the
> optimizer rewrites the query where the index is used within a subquery. With
> this subquery, the row keys of the index rows that satisfy the subquery are
> retrieved by the Phoenix client and then pushed into the Phoenix server
> caches of the data table regions. Finally, on the server side, data table
> rows are scanned and joined with the index rows using HashJoin. Based on the
> selectivity of the original query, this join operation may still result in
> scanning a large amount of data table rows.
> Eliminating these data table scans would be a significant improvement. To do
> that, instead of rewriting the query, the Phoenix optimizer simply treats the
> global index as a covered index for the given query. With this, the Phoenix
> query optimizer chooses the index table for the query especially when the
> index row key prefix length is greater than the data row key prefix length
> for the query. On the server side, the index table is scanned using index row
> key ranges implied by the query and the index row keys are then mapped to the
> data table row keys (please note an index row key includes all the data row
> key columns). Finally, the corresponding data table rows are scanned using
> server-to-server RPCs. PHOENIX-6458 (this Jira) retrieves the data table
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get
> operation with the scan operation to reduce the number of server-to-server
> RPC calls.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)