[
https://issues.apache.org/jira/browse/HBASE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269207#comment-14269207
]
Anoop Sam John commented on HBASE-11295:
----------------------------------------
Client side when retry happen (Due to timeout) it should try with the same
seqId. If u see the server side, when the request comes, immediately we
increment the nextSeqId expected. If the client is not giving this number in
the next call, then it is a problem..
On OOScannerNextException, we retry with a new Scan. But this retry will happen
only one more time. If this new Scan also giving Exception in turn (Same
filtered scan and again taking time) we may get the exception again and throw
back to client. Is this happening?
By design, at client side, the nexSeq increment should not happen for a timeout
recall.
Am I missing something still?
> Long running scan produces OutOfOrderScannerNextException
> ---------------------------------------------------------
>
> Key: HBASE-11295
> URL: https://issues.apache.org/jira/browse/HBASE-11295
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.96.0
> Reporter: Jeff Cunningham
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 1.0.0, 2.0.0, 0.98.10, 1.1.0
>
> Attachments: OutOfOrderScannerNextException.tar.gz
>
>
> Attached Files:
> HRegionServer.java - instramented from 0.96.1.1-cdh5.0.0
> HBaseLeaseTimeoutIT.java - reproducing JUnit 4 test
> WaitFilter.java - Scan filter (extends FilterBase) that overrides
> filterRowKey() to sleep during invocation
> SpliceFilter.proto - Protobuf defintiion for WaitFilter.java
> OutOfOrderScann_InstramentedServer.log - instramented server log
> Steps.txt - this note
> Set up:
> In HBaseLeaseTimeoutIT, create a scan, set the given filter (which sleeps in
> overridden filterRowKey() method) and set it on the scan, and scan the table.
> This is done in test client_0x0_server_150000x10().
> Here's what I'm seeing (see also attached log):
> A new request comes into server (ID 1940798815214593802 -
> RpcServer.handler=96) and a RegionScanner is created for it, cached by ID,
> immediately looked up again and cached RegionScannerHolder's nextCallSeq
> incremeted (now at 1).
> The RegionScan thread goes to sleep in WaitFilter#filterRowKey().
> A short (variable) period later, another request comes into the server (ID
> 8946109289649235722 - RpcServer.handler=98) and the same series of events
> happen to this request.
> At this point both RegionScanner threads are sleeping in
> WaitFilter.filterRowKey(). After another period, the client retries another
> scan request which thinks its next_call_seq is 0. However, HRegionServer's
> cached RegionScannerHolder thinks the matching RegionScanner's nextCallSeq
> should be 1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)