[ https://issues.apache.org/jira/browse/HBASE-16604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15520319#comment-15520319 ]
Hudson commented on HBASE-16604: -------------------------------- FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #23 (See [https://builds.apache.org/job/HBase-1.3-JDK8/23/]) HBASE-16604 Scanner retries on IOException can cause the scans to miss (jerryjch: rev 49a4980e6dac1e74275ae5b042b01cd27efc8ebd) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/DelegatingKeyValueScanner.java > Scanner retries on IOException can cause the scans to miss data > ---------------------------------------------------------------- > > Key: HBASE-16604 > URL: https://issues.apache.org/jira/browse/HBASE-16604 > Project: HBase > Issue Type: Bug > Components: regionserver, Scanners > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4 > > Attachments: HBASE-16604-branch-1.3-addendum.patch, > hbase-16604_v1.patch, hbase-16604_v2.patch, hbase-16604_v3.branch-1.patch, > hbase-16604_v3.patch > > > Debugging an ITBLL failure, where the Verify did not "see" all the data in > the cluster, I've noticed that if we end up getting a generic IOException > from the HFileReader level, we may end up missing the rest of the data in the > region. I was able to manually test this, and this stack trace helps to > understand what is going on: > {code} > 2016-09-09 16:27:15,633 INFO [hconnection-0x71ad3d8a-shared--pool21-t9] > client.ScannerCallable(376): Open scanner=1 for > scan={"loadColumnFamiliesOnDemand":null,"startRow":"","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":2097152,"families":{"testFamily":["testFamily"]},"caching":100,"maxVersions":1,"timeRange":[0,9223372036854775807]} > on region > region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee., > hostname=hw10676,51833,1473463626529, seqNum=2 > 2016-09-09 16:27:15,634 INFO > [B.fifo.QRpcServer.handler=5,queue=0,port=51833] > regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: > 100 close_scanner: false next_call_seq: 0 client_handles_partials: true > client_handles_heartbeats: true renew: false > 2016-09-09 16:27:15,635 INFO > [B.fifo.QRpcServer.handler=5,queue=0,port=51833] > regionserver.RSRpcServices(2510): Rolling back next call seqId > 2016-09-09 16:27:15,635 INFO > [B.fifo.QRpcServer.handler=5,queue=0,port=51833] > regionserver.RSRpcServices(2565): Throwing new > ServiceExceptionjava.io.IOException: Could not reseek > StoreFileScanner[HFileScanner for reader > reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c, > compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, > currentSize=1567264, freeSize=1525578848, maxSize=1527146112, > heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, > multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, > lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, > avgValueLen=3, entries=17576, length=866998, > cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key > /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0 > 2016-09-09 16:27:15,635 DEBUG > [B.fifo.QRpcServer.handler=5,queue=0,port=51833] ipc.CallRunner(110): > B.fifo.QRpcServer.handler=5,queue=0,port=51833: callId: 26 service: > ClientService methodName: Scan size: 26 connection: 192.168.42.75:51903 > java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for > reader > reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c, > compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, > currentSize=1567264, freeSize=1525578848, maxSize=1527146112, > heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, > multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, > lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, > avgValueLen=3, entries=17576, length=866998, > cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key > /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0 > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: foo bar baz > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207) > ... 17 more > 2016-09-09 16:27:15,653 INFO [hconnection-0x71ad3d8a-shared--pool21-t10] > client.ScannerCallable(261): Got exception making request scanner_id: 1 > number_of_rows: 100 close_scanner: false next_call_seq: 0 > client_handles_partials: true client_handles_heartbeats: true renew: false to > region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee., > hostname=hw10676,51833,1473463626529, seqNum=2 > java.io.IOException: java.io.IOException: Could not reseek > StoreFileScanner[HFileScanner for reader > reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c, > compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, > currentSize=1567264, freeSize=1525578848, maxSize=1527146112, > heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, > multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, > lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, > avgValueLen=3, entries=17576, length=866998, > cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key > /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0 > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: foo bar baz > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207) > ... 17 more > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:332) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:256) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:1) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:1) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): > java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for > reader > reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c, > compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, > currentSize=1567264, freeSize=1525578848, maxSize=1527146112, > heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368, > multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, > lastKey=zzz/testFamily:testFamily/1473463634271/Put, avgKeyLen=35, > avgValueLen=3, entries=17576, length=866998, > cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0] to key > /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0 > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: foo bar baz > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207) > ... 17 more > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1225) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:214) > ... 9 more > 2016-09-09 16:27:15,963 INFO > [B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824] > regionserver.RSRpcServices(2196): scan request:region { type: REGION_NAME > value: "hbase:meta,,1" } scan { start_row: > "testScanThrowsException,,99999999999999" max_versions: 1 cache_blocks: true > small: true reversed: true caching: 1 } number_of_rows: 1 close_scanner: true > client_handles_partials: true client_handles_heartbeats: true > 2016-09-09 16:27:15,965 INFO > [B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824] > regionserver.RSRpcServices(2549): class > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:cells_per_result: > 4 scanner_id: 15 more_results: false stale: false partial_flag_per_result: > false more_results_in_region: true heartbeat_message: false > 2016-09-09 16:27:15,967 INFO > [B.fifo.QRpcServer.handler=0,queue=0,port=51833] > regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: > 100 close_scanner: false next_call_seq: 0 client_handles_partials: true > client_handles_heartbeats: true renew: false > 2016-09-09 16:27:15,967 INFO > [B.fifo.QRpcServer.handler=0,queue=0,port=51833] > regionserver.RSRpcServices(2549): class > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id: > 1 more_results: true ttl: 60000 stale: false more_results_in_region: false > 2016-09-09 16:27:15,968 INFO > [B.fifo.QRpcServer.handler=1,queue=0,port=51833] > regionserver.RSRpcServices(2196): scan request:scanner_id: 1 number_of_rows: > 0 close_scanner: true client_handles_partials: true > client_handles_heartbeats: true > 2016-09-09 16:27:15,968 INFO > [B.fifo.QRpcServer.handler=1,queue=0,port=51833] > regionserver.RSRpcServices(2549): class > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id: > 1 more_results: false > 2016-09-09 16:27:15,971 INFO [main] hbase.HBaseTestingUtility(1111): > Shutting down minicluster > {code} > The ScannerCallable is retryable if it gets a IOException. Before the > HBASE-13662 fix, the retry was still failing due to > OutOfOrderScannerException, which then causes the Scanner stack to be reset, > thus masking the issue. After HBASE-13662, we can retry the same scan, but in > case of we are doing a {{reseek()}} which fails in middle with an > IOException, the KVHeaps and the Scanner state on the server side is left > dirty. > It seems to be much safer to reset the scanner state in case of IOExceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)