[ https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498285#comment-17498285 ]
Lars Hofhansl commented on PHOENIX-6458: ---------------------------------------- (y) Awesome > Using global indexes for queries with uncovered columns > ------------------------------------------------------- > > Key: PHOENIX-6458 > URL: https://issues.apache.org/jira/browse/PHOENIX-6458 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 5.1.0 > Reporter: Kadir Ozdemir > Assignee: Lars Hofhansl > Priority: Major > Attachments: PHOENIX-6458.master.001.patch, > PHOENIX-6458.master.002.patch > > > The Phoenix query optimizer does not use a global index for a query with the > columns that are not covered by the global index if the query does not have > the corresponding index hint for this index. With the index hint, the > optimizer rewrites the query where the index is used within a subquery. With > this subquery, the row keys of the index rows that satisfy the subquery are > retrieved by the Phoenix client and then pushed into the Phoenix server > caches of the data table regions. Finally, on the server side, data table > rows are scanned and joined with the index rows using HashJoin. Based on the > selectivity of the original query, this join operation may still result in > scanning a large amount of data table rows. > Eliminating these data table scans would be a significant improvement. To do > that, instead of rewriting the query, the Phoenix optimizer simply treats the > global index as a covered index for the given query. With this, the Phoenix > query optimizer chooses the index table for the query especially when the > index row key prefix length is greater than the data row key prefix length > for the query. On the server side, the index table is scanned using index row > key ranges implied by the query and the index row keys are then mapped to the > data table row keys (please note an index row key includes all the data row > key columns). Finally, the corresponding data table rows are scanned using > server-to-server RPCs. PHOENIX-6458 (this Jira) retrieves the data table > rows one by one using the HBase get operation. PHOENIX-6501 replaces this get > operation with the scan operation to reduce the number of server-to-server > RPC calls. -- This message was sent by Atlassian Jira (v8.20.1#820001)