[
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348001#comment-16348001
]
stack commented on HBASE-19902:
-------------------------------
So, on hadoopqa, I tried upping the proclimit gradually. 6k did not seem to be
enough, nor 8k (or too many concurrent builds also using up processes....).
Working on our proc limit in test suite is a project we need to work on. Our
hadoopqa dumps out some more reporting on what is also running at test start...
you should be able to see any concurrent heavyweights like other hbase test
suites if you look in build artifacts under 'computer'. Will file a follow-up
to work on our resource usage as tests run (still too many therads!!!). For now
hadoopqa is set to 10k which is kinda useless going by [~aw]'s assessment of
limit and how counts are done. Thats where we are at. Will see how it does over
next few days.
> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -----------------------------------------------------------------
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build....
> Changed the hadoopqa config to output long process listing rather than just
> 'java'...
> I can't get loadavg... tried dumping /proc...
> /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console,
> see 7 java processes running on H2. Extra args on ps may help here whether it
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get
> thread info:
> {code}
> Thread 23 (Time-limited test):
> State: RUNNABLE
> Blocked count: 118
> Waited count: 58
> Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
>
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
>
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
>
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
>
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:119)
>
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
>
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
>
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
>
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up....
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test]
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 30000ms
> at
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
> at
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:119)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> at
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Next test starts but doesn't complete.
> Running findHangingTests it finds 24 hung and 151 that have not timed out....
> Trying a few things:
> Set yetus version for hadoopqa temporarily back to 0.6.0 and started this
> build:
> https://builds.apache.org/job/PreCommit-HBASE-Build/11281/console
> ... and this one:
> https://builds.apache.org/job/PreCommit-HBASE-Build/11282/console
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)