[ 
https://issues.apache.org/jira/browse/HBASE-16604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-16604:
----------------------------------
    Attachment: hbase-16604_v3.branch-1.patch

> Scanner retries on IOException can cause the scans to miss data 
> ----------------------------------------------------------------
>
>                 Key: HBASE-16604
>                 URL: https://issues.apache.org/jira/browse/HBASE-16604
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, Scanners
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>
>         Attachments: hbase-16604_v1.patch, hbase-16604_v2.patch, 
> hbase-16604_v3.branch-1.patch, hbase-16604_v3.patch
>
>
> Debugging an ITBLL failure, where the Verify did not "see" all the data in 
> the cluster, I've noticed that if we end up getting a generic IOException 
> from the HFileReader level, we may end up missing the rest of the data in the 
> region. I was able to manually test this, and this stack trace helps to 
> understand what is going on: 
> {code}
> 2016-09-09 16:27:15,633 INFO  [hconnection-0x71ad3d8a-shared--pool21-t9] 
> client.ScannerCallable(376): Open scanner=1 for 
> scan={"loadColumnFamiliesOnDemand":null,"startRow":"","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":2097152,"families":{"testFamily":["testFamily"]},"caching":100,"maxVersions":1,"timeRange":[0,9223372036854775807]}
>  on region 
> region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee.,
>  hostname=hw10676,51833,1473463626529, seqNum=2
> 2016-09-09 16:27:15,634 INFO  
> [B.fifo.QRpcServer.handler=5,queue=0,port=51833] 
> regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: 
> 100 close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true renew: false
> 2016-09-09 16:27:15,635 INFO  
> [B.fifo.QRpcServer.handler=5,queue=0,port=51833] 
> regionserver.RSRpcServices(2510): Rolling back next call seqId
> 2016-09-09 16:27:15,635 INFO  
> [B.fifo.QRpcServer.handler=5,queue=0,port=51833] 
> regionserver.RSRpcServices(2565): Throwing new 
> ServiceExceptionjava.io.IOException: Could not reseek 
> StoreFileScanner[HFileScanner for reader 
> reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
>  compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
> currentSize=1567264, freeSize=1525578848, maxSize=1527146112, 
> heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, 
> multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
> lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
> avgValueLen=3, entries=17576, length=866998, 
> cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
> /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
> 2016-09-09 16:27:15,635 DEBUG 
> [B.fifo.QRpcServer.handler=5,queue=0,port=51833] ipc.CallRunner(110): 
> B.fifo.QRpcServer.handler=5,queue=0,port=51833: callId: 26 service: 
> ClientService methodName: Scan size: 26 connection: 192.168.42.75:51903
> java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
> reader 
> reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
>  compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
> currentSize=1567264, freeSize=1525578848, maxSize=1527146112, 
> heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, 
> multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
> lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
> avgValueLen=3, entries=17576, length=866998, 
> cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
> /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
>       at 
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>       at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: foo bar baz
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
>       ... 17 more
> 2016-09-09 16:27:15,653 INFO  [hconnection-0x71ad3d8a-shared--pool21-t10] 
> client.ScannerCallable(261): Got exception making request scanner_id: 1 
> number_of_rows: 100 close_scanner: false next_call_seq: 0 
> client_handles_partials: true client_handles_heartbeats: true renew: false to 
> region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee.,
>  hostname=hw10676,51833,1473463626529, seqNum=2
> java.io.IOException: java.io.IOException: Could not reseek 
> StoreFileScanner[HFileScanner for reader 
> reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
>  compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
> currentSize=1567264, freeSize=1525578848, maxSize=1527146112, 
> heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, 
> multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
> lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
> avgValueLen=3, entries=17576, length=866998, 
> cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
> /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
>       at 
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>       at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: foo bar baz
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
>       ... 17 more
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>       at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>       at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>       at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:332)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:256)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:1)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:1)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
>       at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
> reader 
> reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
>  compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, 
> currentSize=1567264, freeSize=1525578848, maxSize=1527146112, 
> heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, 
> multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, 
> lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, 
> avgValueLen=3, entries=17576, length=866998, 
> cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key 
> /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
>       at 
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
>       at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>       at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: foo bar baz
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
>       ... 17 more
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1225)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:214)
>       ... 9 more
> 2016-09-09 16:27:15,963 INFO  
> [B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824] 
> regionserver.RSRpcServices(2196): scan request:region { type: REGION_NAME 
> value: "hbase:meta,,1" } scan { start_row: 
> "testScanThrowsException,,99999999999999" max_versions: 1 cache_blocks: true 
> small: true reversed: true caching: 1 } number_of_rows: 1 close_scanner: true 
> client_handles_partials: true client_handles_heartbeats: true
> 2016-09-09 16:27:15,965 INFO  
> [B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824] 
> regionserver.RSRpcServices(2549): class 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:cells_per_result:
>  4 scanner_id: 15 more_results: false stale: false partial_flag_per_result: 
> false more_results_in_region: true heartbeat_message: false
> 2016-09-09 16:27:15,967 INFO  
> [B.fifo.QRpcServer.handler=0,queue=0,port=51833] 
> regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: 
> 100 close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true renew: false
> 2016-09-09 16:27:15,967 INFO  
> [B.fifo.QRpcServer.handler=0,queue=0,port=51833] 
> regionserver.RSRpcServices(2549): class 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id:
>  1 more_results: true ttl: 60000 stale: false more_results_in_region: false
> 2016-09-09 16:27:15,968 INFO  
> [B.fifo.QRpcServer.handler=1,queue=0,port=51833] 
> regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: 
> 0 close_scanner: true client_handles_partials: true 
> client_handles_heartbeats: true
> 2016-09-09 16:27:15,968 INFO  
> [B.fifo.QRpcServer.handler=1,queue=0,port=51833] 
> regionserver.RSRpcServices(2549): class 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id:
>  1 more_results: false
> 2016-09-09 16:27:15,971 INFO  [main] hbase.HBaseTestingUtility(1111): 
> Shutting down minicluster
> {code}
> The ScannerCallable is retryable if it gets a IOException. Before the 
> HBASE-13662 fix, the retry was still failing due to 
> OutOfOrderScannerException, which then causes the Scanner stack to be reset, 
> thus masking the issue. After HBASE-13662, we can retry the same scan, but in 
> case of we are doing a {{reseek()}} which fails in middle with an 
> IOException, the KVHeaps and the Scanner state on the server side is left 
> dirty. 
> It seems to be much safer to reset the scanner state in case of IOExceptions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to