[
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502502#comment-17502502
]
Kadir OZDEMIR commented on PHOENIX-6458:
----------------------------------------
[~larsh], I am not sure why the scanner leak happens for global index but not
for local index as they essentially use the get operation in the same way.
Instead of fixing this Jira, can we test the patch for PHOENIX-6501 as it does
not use get but uses scan? I attached the patch for you test PHOENIX-6501.
> Using global indexes for queries with uncovered columns
> -------------------------------------------------------
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.1.0
> Reporter: Kadir Ozdemir
> Assignee: Kadir OZDEMIR
> Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch,
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the
> columns that are not covered by the global index if the query does not have
> the corresponding index hint for this index. With the index hint, the
> optimizer rewrites the query where the index is used within a subquery. With
> this subquery, the row keys of the index rows that satisfy the subquery are
> retrieved by the Phoenix client and then pushed into the Phoenix server
> caches of the data table regions. Finally, on the server side, data table
> rows are scanned and joined with the index rows using HashJoin. Based on the
> selectivity of the original query, this join operation may still result in
> scanning a large amount of data table rows.
> Eliminating these data table scans would be a significant improvement. To do
> that, instead of rewriting the query, the Phoenix optimizer simply treats the
> global index as a covered index for the given query. With this, the Phoenix
> query optimizer chooses the index table for the query especially when the
> index row key prefix length is greater than the data row key prefix length
> for the query. On the server side, the index table is scanned using index row
> key ranges implied by the query and the index row keys are then mapped to the
> data table row keys (please note an index row key includes all the data row
> key columns). Finally, the corresponding data table rows are scanned using
> server-to-server RPCs. PHOENIX-6458 (this Jira) retrieves the data table
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get
> operation with the scan operation to reduce the number of server-to-server
> RPC calls.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)