[
https://issues.apache.org/jira/browse/HBASE-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406734#comment-15406734
]
Ted Yu commented on HBASE-16349:
--------------------------------
Test failure was due to cluster unable to start:
{code}
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: The
region 3043e95376d0c15ccbd93f71fddb69fa was already closing. New CLOSE request
is ignored.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2867)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.closeRegion(RSRpcServices.java:1240)
at
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22741)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2264)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown
Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:332)
at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1759)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:837)
at
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1890)
at
org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:2014)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1620)
at
org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:48)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException):
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: The
region 3043e95376d0c15ccbd93f71fddb69fa was already closing. New CLOSE request
is ignored.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2867)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.closeRegion(RSRpcServices.java:1240)
at
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22741)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2264)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
{code}
> TestClusterId may hang during cluster shutdown
> ----------------------------------------------
>
> Key: HBASE-16349
> URL: https://issues.apache.org/jira/browse/HBASE-16349
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Priority: Minor
> Attachments: 16349.branch-1.v1.txt
>
>
> I was running TestClusterId on branch-1 where I observed the test hang during
> test tearDown().
> {code}
> 2016-08-03 11:36:39,600 DEBUG [RS_CLOSE_META-cn012:49371-0]
> regionserver.HRegion(1415): Closing hbase:meta,,1.1588230740: disabling
> compactions & flushes
> 2016-08-03 11:36:39,600 DEBUG [RS_CLOSE_META-cn012:49371-0]
> regionserver.HRegion(1442): Updates disabled for region
> hbase:meta,,1.1588230740
> 2016-08-03 11:36:39,600 INFO [RS_CLOSE_META-cn012:49371-0]
> regionserver.HRegion(2253): Flushing 1/1 column families, memstore=232 B
> 2016-08-03 11:36:39,601 WARN [RS_OPEN_META-cn012:49371-0.append-pool17-t1]
> wal.FSHLog$RingBufferEventHandler(1900): Append sequenceId=8, requesting roll
> of WAL
> java.io.IOException: All datanodes
> DatanodeInfoWithStorage[127.0.0.1:37765,DS-9870993e-fb98-45fc-b151-708f72aa02d2,DISK]
> are bad. Aborting...
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1113)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
> 2016-08-03 11:36:39,602 FATAL [RS_CLOSE_META-cn012:49371-0]
> regionserver.HRegionServer(2085): ABORTING region server
> cn012.l42scl.hortonworks.com,49371,1470249187586: Unrecoverable
> exception while closing region hbase:meta,,1.1588230740, still finishing close
> org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append
> sequenceId=8, requesting roll of WAL
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1902)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1754)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1676)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: All datanodes
> DatanodeInfoWithStorage[127.0.0.1:37765,DS-9870993e-fb98-45fc-b151-708f72aa02d2,DISK]
> are bad. Aborting...
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1113)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
> 2016-08-03 11:36:39,603 FATAL [RS_CLOSE_META-cn012:49371-0]
> regionserver.HRegionServer(2093): RegionServer abort: loaded coprocessors
> are: [org.apache.hadoop.hbase.coprocessor. MultiRowMutationEndpoint]
> {code}
> This led to rst.join() hanging:
> {code}
> "RS:0;cn012:49371" #648 prio=5 os_prio=0 tid=0x00007fdab24b5000 nid=0x621a
> waiting on condition [0x00007fd785fe0000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.sleep(HRegionServer.java:1326)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:1312)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1082)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)