Pinged Bobby who has access to the machines, and this is indeed what is happening. There were two cases:

1) When TestUberAM times out the minicluster it is running along with its MRAppMaster process can escape. There are a ton of threads in those processes, so it doesn't take very many of these to leak before the process ulimit is hit.

2) There were a couple of other surefire processes that had leaked but they had no discernable state left that would identify which test it was running other than it was something inside of mapreduce-client-jobclient (which could still be TestUberAM). The main thread and most other non-daemon threads were gone, but there was a lone SocketReader thread that was still hanging around. It wasn't a daemon thread and was apparently the only thread keeping the JVM alive.

So we need to prioritize fixing the TestUberAM hang, currently tracked by MAPREDUCE-5481 <https://issues.apache.org/jira/browse/MAPREDUCE-5481> and/or find a way to keep it from escaping during builds. There might be another issue where SocketReader threads can prevent the JVM from shutting down completely in some cases.

Jason

On 10/31/2013 08:19 AM, Jason Lowe wrote:
I don't think that OOM error below indicates it needs more heap space, as it's complaining about the ability to create a new native thread. That usually is caused by lack of available virtual address space or hitting process ulimits.

What's most likely going on is the jenkins user is hitting a process ulimit. This can occur if processes have "leaked" from previous build/test runs and are using a large number of threads, or a large number of processes have leaked overall. Could someone with access to the build machines check if that is indeed the case? If it has, bonus points for indentifying the source of the leak. ;-)

Thanks!

Jason

On 10/30/2013 05:39 PM, Roman Shaposhnik wrote:
I can take a look sometime later today. Meantime I can only
say that I've been running into 1Gb limit in a few builds as
of late. These days -- I just go with 2G by default.

Thanks,
Roman.

On Wed, Oct 30, 2013 at 3:33 PM, Alejandro Abdelnur <t...@cloudera.com> wrote:
The following is happening in builds for MAPREDUCE and YARN patches.
I've seen the failures in hadoop5 and hadoop7 machines. I've increased
Maven memory to 1GB (export MAVEN_OPTS="-Xmx1024m" in the jenkins
jobs) but still some failures persist:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4159/

Does anybody has an idea of what may be going on?



thx


[INFO] --- native-maven-plugin:1.0-alpha-7:javah (default) @ hadoop-common ---
[INFO] /bin/sh -c cd
/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-common-project/hadoop-common
&& /home/jenkins/tools/java/latest/bin/javah -d
/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-common-project/hadoop-common/target/native/javah -classpath /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-common-project/hadoop-common/target/classes:/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-common-project/hadoop-annotations/target/classes:/home/jenkins/tools/java/jdk1.6.0_26/jre/../lib/tools.jar:/home/jenkins/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar:/home/jenkins/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/jenkins/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/jenkins/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/home/jenkins/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/jenkins/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/jenkins/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/home/jenkins/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar:/home/jenkins/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/jenkins/.m2/repository/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar:/home/jenkins/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/home/jenkins/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/jenkins/.m2/repository/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar:/home/jenkins/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar:/home/jenkins/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar:/home/jenkins/.m2/repository/stax/stax-api/1.0.1/stax-api-1.0.1.jar:/home/jenkins/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar:/home/jenkins/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/jenkins/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/home/jenkins/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.8.8/jackson-jaxrs-1.8.8.jar:/home/jenkins/.m2/repository/org/codehaus/jackson/jackson-xc/1.8.8/jackson-xc-1.8.8.jar:/home/jenkins/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar:/home/jenkins/.m2/repository/asm/asm/3.2/asm-3.2.jar:/home/jenkins/.m2/repository/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:/home/jenkins/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/jenkins/.m2/repository/net/java/dev/jets3t/jets3t/0.6.1/jets3t-0.6.1.jar:/home/jenkins/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar:/home/jenkins/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/jenkins/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/home/jenkins/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/jenkins/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/jenkins/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/home/jenkins/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar:/home/jenkins/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.jar:/home/jenkins/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.8/jackson-mapper-asl-1.8.8.jar:/home/jenkins/.m2/repository/org/apache/avro/avro/1.7.4/avro-1.7.4.jar:/home/jenkins/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar:/home/jenkins/.m2/repository/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar:/home/jenkins/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-common-project/hadoop-auth/target/classes:/home/jenkins/.m2/repository/com/jcraft/jsch/0.1.42/jsch-0.1.42.jar:/home/jenkins/.m2/repository/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar:/home/jenkins/.m2/repository/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar:/home/jenkins/.m2/repository/org/tukaani/xz/1.0/xz-1.0.jar
org.apache.hadoop.io.compress.zlib.ZlibCompressor
org.apache.hadoop.io.compress.zlib.ZlibDecompressor
org.apache.hadoop.io.compress.bzip2.Bzip2Compressor
org.apache.hadoop.io.compress.bzip2.Bzip2Decompressor
org.apache.hadoop.security.JniBasedUnixGroupsMapping
org.apache.hadoop.io.nativeio.NativeIO
org.apache.hadoop.security.JniBasedUnixGroupsNetgroupMapping
org.apache.hadoop.io.compress.snappy.SnappyCompressor
org.apache.hadoop.io.compress.snappy.SnappyDecompressor
org.apache.hadoop.io.compress.lz4.Lz4Compressor
org.apache.hadoop.io.compress.lz4.Lz4Decompressor
org.apache.hadoop.util.NativeCrc32
org.apache.hadoop.net.unix.DomainSocket
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread



--
Alejandro



Reply via email to