[ 
https://issues.apache.org/jira/browse/PHOENIX-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085955#comment-14085955
 ] 

James Taylor commented on PHOENIX-1146:
---------------------------------------

bq. Would it help to setup the ClientScanners such that they do not retry and 
hence pass a NSRE up immediately.

I do think that'd be an improvement, as it seems that the client cannot always 
recover correctly based on the conversation in HBASE-11667. The idea of the 
Phoenix workaround for this, though, is to only retry the parallel chunk of 
work that fails, not the entire query. That way the partial work/results 
produced by the other parallel scans would not need to be restarted. A further 
optimization that Phoenix would do that likely wouldn't be possible in HBase in 
general would be to throw the exception *before* the scan starts, as we can 
detect that based on the start/stop row in the scan versus the region 
boundaries. I'm guessing the HBase detects this situation while it's part way 
through the scan. There's also the issue of interrupting a thread causes the 
HConnection to be closed (is this always the case?) which would occur if the 
query is canceled and retried in its entirety. The solutions I'm proposing 
wouldn't require the client threads to be interrupted.

> Detect stale client region cache on server and retry scans in split regions
> ---------------------------------------------------------------------------
>
>                 Key: PHOENIX-1146
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1146
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 3.1, 4.1
>            Reporter: James Taylor
>            Assignee: James Taylor
>
> HBase cannot recover correctly from an aggregate scan run on the coprocessor 
> side (see HBASE-116670). This can lead to incorrect query results the first 
> time a query is run after a split occurs (due to the region boundary cache 
> being stale). Phoenix can work around this by:
> - detecting on server before the scan starts that the region cache used by 
> the client is out-of-date. This can be done up-front because the start/stop 
> row of the scan should never span across a region boundary. In this case, a 
> DoNotRetryIOException is thrown with some embedded information to cause a 
> StaleRegionBoundaryCacheException to be thrown on the client.
> - catching this exception on the client (in ParallelIterators), refreshing 
> the region boundary cache, and re-running the necessary scans based on the 
> new region boundaries.
> - detecting if this happens more than N times to prevent any kind of 
> excessive looping due to splits occurring over and over again.
> Phoenix has additional requirements above and beyond standard HBase clients, 
> so even if HBase could recover from this situation, Phoenix would likely need 
> this workaround to ensure that a scan does not span across region boundaries. 
> This is required when the client is doing a merge sort on the results of the 
> parallel scans, mainly in ORDER BY (including topN) and local indexing, and 
> potentially GROUP BY if we move toward sorting the distinct groups on the 
> server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to