Enis Soztutar created HBASE-16604:
-------------------------------------

             Summary: Scanner retries on IOException can cause the scans to 
miss data 
                 Key: HBASE-16604
                 URL: https://issues.apache.org/jira/browse/HBASE-16604
             Project: HBase
          Issue Type: Bug
          Components: regionserver, Scanners
            Reporter: Enis Soztutar
            Assignee: Enis Soztutar
             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.3, 1.1.7


Debugging an ITBLL failure, where the Verify did not "see" all the data in the 
cluster, I've noticed that if we end up getting a generic IOException from the 
HFileReader level, we may end up missing the rest of the data in the region. I 
was able to manually test this, and this stack trace helps to understand what 
is going on: 
{code}
2016-09-09 16:27:15,633 INFO  [hconnection-0x71ad3d8a-shared--pool21-t9] 
client.ScannerCallable(376): Open scanner=1 for 
scan={"loadColumnFamiliesOnDemand":null,"startRow":"","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":2097152,"families":{"testFamily":["testFamily"]},"caching":100,"maxVersions":1,"timeRange":[0,9223372036854775807]}
 on region 
region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee.,
 hostname=hw10676,51833,1473463626529, seqNum=2
2016-09-09 16:27:15,634 INFO  [B.fifo.QRpcServer.handler=5,queue=0,port=51833] 
regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: 
100 close_scanner: false next_call_seq: 0 client_handles_partials: true 
client_handles_heartbeats: true renew: false
2016-09-09 16:27:15,635 INFO  [B.fifo.QRpcServer.handler=5,queue=0,port=51833] 
regionserver.RSRpcServices(2510): Rolling back next call seqId
2016-09-09 16:27:15,635 INFO  [B.fifo.QRpcServer.handler=5,queue=0,port=51833] 
regionserver.RSRpcServices(2565): Throwing new 
ServiceExceptionjava.io.IOException: Could not reseek 
StoreFileScanner[HFileScanner for reader 
reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
 compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
currentSize=1567264, freeSize=1525578848, maxSize=1527146112, heapSize=1567264, 
minSize=1450788736, minFactor=0.95, multiSize=725394368, multiFactor=0.5, 
singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, 
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, 
cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, 
firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
avgValueLen=3, entries=17576, length=866998, 
cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
/testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
2016-09-09 16:27:15,635 DEBUG [B.fifo.QRpcServer.handler=5,queue=0,port=51833] 
ipc.CallRunner(110): B.fifo.QRpcServer.handler=5,queue=0,port=51833: callId: 26 
service: ClientService methodName: Scan size: 26 connection: 192.168.42.75:51903
java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader 
reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
 compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
currentSize=1567264, freeSize=1525578848, maxSize=1527146112, heapSize=1567264, 
minSize=1450788736, minFactor=0.95, multiSize=725394368, multiFactor=0.5, 
singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, 
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, 
cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, 
firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
avgValueLen=3, entries=17576, length=866998, 
cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
/testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
        at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: foo bar baz
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
        ... 17 more
2016-09-09 16:27:15,653 INFO  [hconnection-0x71ad3d8a-shared--pool21-t10] 
client.ScannerCallable(261): Got exception making request scanner_id: 1 
number_of_rows: 100 close_scanner: false next_call_seq: 0 
client_handles_partials: true client_handles_heartbeats: true renew: false to 
region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee.,
 hostname=hw10676,51833,1473463626529, seqNum=2
java.io.IOException: java.io.IOException: Could not reseek 
StoreFileScanner[HFileScanner for reader 
reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
 compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
currentSize=1567264, freeSize=1525578848, maxSize=1527146112, heapSize=1567264, 
minSize=1450788736, minFactor=0.95, multiSize=725394368, multiFactor=0.5, 
singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, 
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, 
cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, 
firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
avgValueLen=3, entries=17576, length=866998, 
cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
/testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
        at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: foo bar baz
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
        ... 17 more

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:332)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:256)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:1)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:1)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
        at 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader 
reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
 compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
currentSize=1567264, freeSize=1525578848, maxSize=1527146112, heapSize=1567264, 
minSize=1450788736, minFactor=0.95, multiSize=725394368, multiFactor=0.5, 
singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, 
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, 
cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, 
firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
avgValueLen=3, entries=17576, length=866998, 
cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
/testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
        at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: foo bar baz
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
        ... 17 more

        at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1225)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:214)
        ... 9 more
2016-09-09 16:27:15,963 INFO  
[B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824] 
regionserver.RSRpcServices(2196): scan request:region { type: REGION_NAME 
value: "hbase:meta,,1" } scan { start_row: 
"testScanThrowsException,,99999999999999" max_versions: 1 cache_blocks: true 
small: true reversed: true caching: 1 } number_of_rows: 1 close_scanner: true 
client_handles_partials: true client_handles_heartbeats: true
2016-09-09 16:27:15,965 INFO  
[B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824] 
regionserver.RSRpcServices(2549): class 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:cells_per_result:
 4 scanner_id: 15 more_results: false stale: false partial_flag_per_result: 
false more_results_in_region: true heartbeat_message: false
2016-09-09 16:27:15,967 INFO  [B.fifo.QRpcServer.handler=0,queue=0,port=51833] 
regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: 
100 close_scanner: false next_call_seq: 0 client_handles_partials: true 
client_handles_heartbeats: true renew: false
2016-09-09 16:27:15,967 INFO  [B.fifo.QRpcServer.handler=0,queue=0,port=51833] 
regionserver.RSRpcServices(2549): class 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id:
 1 more_results: true ttl: 60000 stale: false more_results_in_region: false
2016-09-09 16:27:15,968 INFO  [B.fifo.QRpcServer.handler=1,queue=0,port=51833] 
regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: 0 
close_scanner: true client_handles_partials: true client_handles_heartbeats: 
true
2016-09-09 16:27:15,968 INFO  [B.fifo.QRpcServer.handler=1,queue=0,port=51833] 
regionserver.RSRpcServices(2549): class 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id:
 1 more_results: false
2016-09-09 16:27:15,971 INFO  [main] hbase.HBaseTestingUtility(1111): Shutting 
down minicluster
{code}

The ScannerCallable is retryable if it gets a IOException. Before the 
HBASE-13662 fix, the retry was still failing due to OutOfOrderScannerException, 
which then causes the Scanner stack to be reset, thus masking the issue. After 
HBASE-13662, we can retry the same scan, but in case of we are doing a 
{{reseek()}} which fails in middle with an IOException, the KVHeaps and the 
Scanner state on the server side is left dirty. 

It seems to be much safer to reset the scanner state in case of IOExceptions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to