[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502729#comment-17502729
 ] 

Lars Hofhansl commented on PHOENIX-6458:
----------------------------------------

I think I figured out the problem... It's an HBase 2.x problem:
In HBase 2.x the Rpc handler is responsible for closing scanners. However, when 
you retrieve a Connection from a RegionCoprocessorEnvironment and the target 
happens to be local then there is no RPC handler, and hence the RegionScanners 
will never get close. This is a gaping HBase bug.

In Phoenix we can fix that by using 
{{org.apache.hadoop.hbase.client.ConnectionFactory#createConnection}}, but that 
is very expensive. I think we should give up here and focus on PHOENIX-6501 and 
file an HBase bug.

> Using global indexes for queries with uncovered columns
> -------------------------------------------------------
>
>                 Key: PHOENIX-6458
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6458
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.1.0
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>             Fix For: 4.17.0, 5.2.0, 5.1.3
>
>         Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to