[
https://issues.apache.org/jira/browse/YETUS-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224079#comment-16224079
]
Allen Wittenauer edited comment on YETUS-570 at 10/29/17 4:24 PM:
------------------------------------------------------------------
After a terrible sleep, two ideas:
* On bash4, change the order up:
** Remove the ulimit from it's current location
** run echo_and_redirect as a coproc with the ulimit in place. This would
allow the primary test-patch code to run unrestricted by actually putting the
proc limit walls up around ant, make, maven, etc.
* Prior to launching Docker, yetus should kill processes that match reaper
names that are older than X days. This would help out on things like what I'm
seeing on H3:
{code}
jenkins 4228 4203 0 Jul18 ? 00:00:00 /bin/bash -c java
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n
-Dlog4j.properties=custom_log4j.properties
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
-Xmx96m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
-Dhadoop.root.logger=DEBUG,RFA
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
-Dapex.application.name=$'testApp' -Dlog4j.debug=true
com.datatorrent.stram.StreamingAppMaster
1>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stdout
2>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stderr
jenkins 4257 4228 0 Jul18 ? 01:21:58 java
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n
-Dlog4j.properties=custom_log4j.properties
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
-Xmx96m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
-Dhadoop.root.logger=DEBUG,RFA
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
-Dapex.application.name=testApp -Dlog4j.debug=true
com.datatorrent.stram.StreamingAppMaster
jenkins 13391 1 0 Sep21 ? 00:48:46
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter6160994687041676643.jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire3219510197920875932tmp
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7295825218010268940443tmp
jenkins 19952 1 0 Sep11 ? 01:01:57
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter741259645380147856.jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire7882552254579184169tmp
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7302959699799093808908tmp
jenkins 22540 1 0 Aug30 ? 01:17:34
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter2604914348066585301.jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire6579405034700130938tmp
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_730278414035615526144tmp
jenkins 28511 1 64 Sep17 ? 27-10:53:42
/usr/local/asfpackages/java/jdk1.8.0_144/jre/bin/java -Xmx1024m -da -jar
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefirebooter7264422334614348257.jar
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire1763721606174158064tmp
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire_3143712379344115217tmp
{code}
was (Author: aw):
After a terrible sleep, two ideas:
* On bash4, change the order up:
** Remove the ulimit from it's current location
** run echo_and_redirect as a coproc with the ulimit in place. This would
allow the primary test-patch code to run unrestricted by actually putting the
proc limit walls up around ant, make, maven, etc.
* Prior to launching Docker, yetus should kill processes that match reaper
names that are older than X days. This would help out on things like what I'm
seeing on H2:
{code}
jenkins 4228 4203 0 Jul18 ? 00:00:00 /bin/bash -c java
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n
-Dlog4j.properties=custom_log4j.properties
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
-Xmx96m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
-Dhadoop.root.logger=DEBUG,RFA
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
-Dapex.application.name=$'testApp' -Dlog4j.debug=true
com.datatorrent.stram.StreamingAppMaster
1>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stdout
2>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stderr
jenkins 4257 4228 0 Jul18 ? 01:21:58 java
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n
-Dlog4j.properties=custom_log4j.properties
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
-Xmx96m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
-Dhadoop.root.logger=DEBUG,RFA
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
-Dapex.application.name=testApp -Dlog4j.debug=true
com.datatorrent.stram.StreamingAppMaster
jenkins 13391 1 0 Sep21 ? 00:48:46
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter6160994687041676643.jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire3219510197920875932tmp
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7295825218010268940443tmp
jenkins 19952 1 0 Sep11 ? 01:01:57
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter741259645380147856.jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire7882552254579184169tmp
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7302959699799093808908tmp
jenkins 22540 1 0 Aug30 ? 01:17:34
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter2604914348066585301.jar
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire6579405034700130938tmp
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_730278414035615526144tmp
jenkins 28511 1 64 Sep17 ? 27-10:53:42
/usr/local/asfpackages/java/jdk1.8.0_144/jre/bin/java -Xmx1024m -da -jar
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefirebooter7264422334614348257.jar
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire1763721606174158064tmp
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire_3143712379344115217tmp
{code}
> Report and optionally kill stale JVMs between unit test modules
> ---------------------------------------------------------------
>
> Key: YETUS-570
> URL: https://issues.apache.org/jira/browse/YETUS-570
> Project: Yetus
> Issue Type: New Feature
> Components: Test Patch
> Affects Versions: 0.6.0
> Reporter: Allen Wittenauer
> Assignee: Allen Wittenauer
> Attachments: YETUS-570.wip.00.patch, YETUS-570.wip.01.patch
>
>
> YETUS-561 does a great job of preventing fork bombs and stale processes from
> destroying machines. However, Yetus should do a better job of:
> a) Reporting when that happens
> b) Preventing that from happening
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)