[
https://issues.apache.org/jira/browse/HBASE-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720087#comment-14720087
]
stack commented on HBASE-14327:
-------------------------------
The archiver is running in between close and open of the region in the new
location? Is the file removed the result of the compaction or is it one of the
inputs on the original region? The archive thinks it is safe to remove though
the compaction has not 'completed' yet? That sounds like another issue. In this
case, if the archiver is moving the file aside, before the new region open has
a chance to work on it, yeah, delay or stop the archiver in the test.
Thank you for digging in on this [~chenheng]
> TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
> --------------------------------------------------------------
>
> Key: HBASE-14327
> URL: https://issues.apache.org/jira/browse/HBASE-14327
> Project: HBase
> Issue Type: Bug
> Components: test
> Reporter: Dima Spivak
> Priority: Critical
>
> I'm looking into some more of the flaky tests on trunk and this one seems to
> be particularly gross, failing about half the time in recent days. Some
> probably-relevant output from [a recent
> run|https://builds.apache.org/job/HBase-TRUNK/6761/testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompactionAfterWALSync/]:
> {noformat}
> 2015-08-27 18:50:14,318 INFO [main] hbase.TestIOFencing(326): Allowing
> compaction to proceed
> 2015-08-27 18:50:14,318 DEBUG [main]
> hbase.TestIOFencing$CompactionBlockerRegion(110): allowing compactions
> 2015-08-27 18:50:14,318 DEBUG
> [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732):
> Removing store files after compaction...
> 2015-08-27 18:50:14,323 DEBUG
> [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732):
> Removing store files after compaction...
> 2015-08-27 18:50:14,330 DEBUG
> [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224):
> Archiving compacted store files.
> 2015-08-27 18:50:14,331 DEBUG
> [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224):
> Archiving compacted store files.
> 2015-08-27 18:50:14,337 DEBUG
> [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438):
> Finished archiving from class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
> file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464,
> to
> hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464
> 2015-08-27 18:50:14,337 DEBUG
> [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438):
> Finished archiving from class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
> file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe,
> to
> hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe
> 2015-08-27 18:50:14,341 DEBUG
> [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438):
> Finished archiving from class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
> file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc,
> to
> hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
> 2015-08-27 18:50:14,342 INFO
> [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353):
> Completed compaction of 2 (all) file(s) in family of
> tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311. into
> e138bb0ec6c64ad19efab3b44dbbcb1a(size=68.7 K), total size for store is 146.9
> K. This selection was in queue for 0sec, and took 10sec to execute.
> 2015-08-27 18:50:14,343 INFO
> [RS:0;hemera:35619-longCompactions-1440701391112]
> regionserver.CompactSplitThread$CompactionRunner(527): Completed compaction:
> Request =
> regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.,
> storeName=family, fileCount=2, fileSize=73.1 K, priority=998,
> time=525052314434020; duration=10sec
> 2015-08-27 18:50:14,343 DEBUG
> [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438):
> Finished archiving from class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
> file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b,
> to
> hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b
> 2015-08-27 18:50:14,347 DEBUG [RS_CLOSE_REGION-hemera:35619-0]
> regionserver.HRegion(1405): Updates disabled for region
> tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
> 2015-08-27 18:50:14,343 DEBUG
> [RS:0;hemera:35619-longCompactions-1440701391112]
> regionserver.CompactSplitThread$CompactionRunner(550): CompactSplitThread
> Status: compaction_queue=(0:3), split_queue=0, merge_queue=0
> 2015-08-27 18:50:14,348 ERROR
> [RS:0;hemera:35619-shortCompactions-1440701403303]
> regionserver.CompactSplitThread$CompactionRunner(547): Compaction failed
> Request =
> regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.,
> storeName=family, fileCount=2, fileSize=51.6 K, priority=998,
> time=525051341420855
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.HStore.logCompactionEndMessage(HStore.java:1343)
> at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1259)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1812)
> at
> org.apache.hadoop.hbase.TestIOFencing$CompactionBlockerRegion.compact(TestIOFencing.java:127)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2015-08-27 18:50:14,349 DEBUG
> [RS:0;hemera:35619-shortCompactions-1440701403303]
> regionserver.CompactSplitThread$CompactionRunner(550): CompactSplitThread
> Status: compaction_queue=(0:3), split_queue=0, merge_queue=0
> 2015-08-27 18:50:14,351 INFO
> [StoreCloserThread-tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.-1]
> regionserver.HStore(889): Closed family
> 2015-08-27 18:50:14,352 ERROR [RS_CLOSE_REGION-hemera:35619-0]
> regionserver.HRegion(1493): Memstore size is 62664
> 2015-08-27 18:50:14,353 INFO [RS_CLOSE_REGION-hemera:35619-0]
> regionserver.HRegion(1506): Closed
> tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
> 2015-08-27 18:50:14,353 DEBUG [RS_CLOSE_REGION-hemera:35619-0]
> handler.CloseRegionHandler(122): Closed
> tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
> 2015-08-27 18:50:14,361 ERROR
> [RS:1;hemera:39431-shortCompactions-1440701414271]
> regionserver.CompactSplitThread$CompactionRunner(541): Compaction failed
> Request =
> regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.,
> storeName=family, fileCount=9, fileSize=289.6 K (25.6 K, 47.0 K, 26.0 K, 68.7
> K, 26.1 K, 26.1 K, 26.1 K, 26.1 K, 18.0 K), priority=991, time=525062316035397
> java.io.FileNotFoundException: File does not exist:
> /user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
> at
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
> at
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
> at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
> at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:160)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
> at
> org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:106)
> at
> org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:82)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:248)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:385)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:492)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:115)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:99)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.createFileScanners(Compactor.java:203)
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:70)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1231)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1812)
> at
> org.apache.hadoop.hbase.TestIOFencing$CompactionBlockerRegion.compact(TestIOFencing.java:127)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
> does not exist:
> /user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at org.apache.hadoop.ipc.Client.call(Client.java:1411)
> at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
> at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
> at com.sun.proxy.$Proxy22.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
> ... 29 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)