[
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377291#comment-16377291
]
Ted Yu commented on HBASE-20081:
--------------------------------
After a few of these:
{code}
Thread 22 (Time-limited test):
State: RUNNABLE
Blocked count: 583
Waited count: 1063
Stack:
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:169)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:135)
org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:385)
org.apache.hadoop.hbase.MiniHBaseCluster.waitUntilShutDown(MiniHBaseCluster.java:867)
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:1133)
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1108)
{code}
The final stack frame contained:
{code}
"Time-limited test" daemon prio=5 tid=22 runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
at io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:591)
at
io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:561)
at
io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:146)
at
io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:69)
at
org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:266)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2006)
at
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2015)
at
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2005)
at
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1984)
at
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1958)
at
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1951)
at
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniDFSCluster(HBaseTestingUtility.java:767)
at
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1109)
at
org.apache.hadoop.hbase.master.procedure.TestTableDDLProcedureBase.cleanupTest(TestTableDDLProcedureBase.java:53)
{code}
It seems that the test was waiting for the DataNode to shutdown.
> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> ------------------------------------------------------------------------------
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
> Issue Type: Test
> Reporter: Ted Yu
> Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
> was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread]
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777
> is not online or isn't known to the master.The latter could be caused by a
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1]
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
> at
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
> at
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
> at
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
> at
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
> at
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be
> seen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)