[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347195#comment-16347195
 ] 

Allen Wittenauer commented on HBASE-19902:
------------------------------------------

Copying my comment from HBASE-19887:

===

Chances are that the unit tests are going over the 5k mark. The number in the 
output is what was measured as successfully launched in a given interval. It 
does not measure how many threads were attempted. One way to further test this 
is to set proclimit to something higher (like 10k) and running on H30 which has 
a higher UserTasksMax configured.

===

Two other things:

* be aware of parallelism.  If parallelism is set to five, two tests are 
running, and three new tests try to launch at the same time, but each needs 
900, the run will blow up but the number reported will be low.

* One of the outcomes of HDFS-12711 was finding out that surefire will not 
always report test failures under certain circumstances such as if surefire 
itself starts to OOM.  In other words, if surefire fails to launch a test, it 
may not record ANY result for it.  This means tests may have been failing 
before but were never reported as neither success nor fail.  They just never 
existed as far as the harness is concerned.  Now, these tests are getting 
reported because the lower limit means troubled tests fail quicker, freeing up 
more resources for surefire to keep pounding away.  See also SUREFIRE-1447.

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -----------------------------------------------------------------
>
>                 Key: HBASE-19902
>                 URL: https://issues.apache.org/jira/browse/HBASE-19902
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Major
>         Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build....
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
>     sun.management.ThreadImpl.getThreadInfo1(Native Method)
>     sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
>     sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
>     
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
>     sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     java.lang.reflect.Method.invoke(Method.java:498)
>     
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
>     org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
>     
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
>     
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>     org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>     org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:119)
>     
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>     
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>     
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
>     
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
>     
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
>     
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up....
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] 
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 30000ms
>       at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
>       at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>       at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>       at 
> org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:119)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
>       at 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> Next test starts but doesn't complete.
> Running findHangingTests it finds 24 hung and 151 that have not timed out....
> Trying a few things:
> Set yetus version for hadoopqa temporarily back to 0.6.0 and started this 
> build:
> https://builds.apache.org/job/PreCommit-HBASE-Build/11281/console
> ... and this one:
> https://builds.apache.org/job/PreCommit-HBASE-Build/11282/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to