[ 
https://issues.apache.org/jira/browse/YETUS-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224079#comment-16224079
 ] 

Allen Wittenauer edited comment on YETUS-570 at 10/29/17 4:24 PM:
------------------------------------------------------------------

After a terrible sleep, two ideas:

* On bash4, change the order up: 
** Remove the ulimit from it's current location
** run echo_and_redirect as a coproc with the ulimit in place.  This would 
allow the primary test-patch code to run unrestricted by actually putting the 
proc limit walls up around ant, make, maven, etc.

* Prior to launching Docker, yetus should kill processes that match reaper 
names that are older than X days.  This would help out on things like what I'm 
seeing on H3:

{code}
jenkins   4228  4203  0 Jul18 ?        00:00:00 /bin/bash -c java 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-Dlog4j.properties=custom_log4j.properties 
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
 -Xmx96m -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
 -Dhadoop.root.logger=DEBUG,RFA 
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
 
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
 -Dapex.application.name=$'testApp' -Dlog4j.debug=true 
com.datatorrent.stram.StreamingAppMaster 
1>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stdout
 
2>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stderr
 
jenkins   4257  4228  0 Jul18 ?        01:21:58 java 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-Dlog4j.properties=custom_log4j.properties 
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
 -Xmx96m -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
 -Dhadoop.root.logger=DEBUG,RFA 
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
 
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
 -Dapex.application.name=testApp -Dlog4j.debug=true 
com.datatorrent.stram.StreamingAppMaster
jenkins  13391     1  0 Sep21 ?        00:48:46 
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m 
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter6160994687041676643.jar
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire3219510197920875932tmp
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7295825218010268940443tmp
jenkins  19952     1  0 Sep11 ?        01:01:57 
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m 
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter741259645380147856.jar
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire7882552254579184169tmp
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7302959699799093808908tmp
jenkins  22540     1  0 Aug30 ?        01:17:34 
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m 
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter2604914348066585301.jar
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire6579405034700130938tmp
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_730278414035615526144tmp
jenkins  28511     1 64 Sep17 ?        27-10:53:42 
/usr/local/asfpackages/java/jdk1.8.0_144/jre/bin/java -Xmx1024m -da -jar 
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefirebooter7264422334614348257.jar
 
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire1763721606174158064tmp
 
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire_3143712379344115217tmp
{code}



was (Author: aw):
After a terrible sleep, two ideas:

* On bash4, change the order up: 
** Remove the ulimit from it's current location
** run echo_and_redirect as a coproc with the ulimit in place.  This would 
allow the primary test-patch code to run unrestricted by actually putting the 
proc limit walls up around ant, make, maven, etc.

* Prior to launching Docker, yetus should kill processes that match reaper 
names that are older than X days.  This would help out on things like what I'm 
seeing on H2:

{code}
jenkins   4228  4203  0 Jul18 ?        00:00:00 /bin/bash -c java 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-Dlog4j.properties=custom_log4j.properties 
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
 -Xmx96m -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
 -Dhadoop.root.logger=DEBUG,RFA 
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
 
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
 -Dapex.application.name=$'testApp' -Dlog4j.debug=true 
com.datatorrent.stram.StreamingAppMaster 
1>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stdout
 
2>/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001/AppMaster.stderr
 
jenkins   4257  4228  0 Jul18 ?        01:21:58 java 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-Dlog4j.properties=custom_log4j.properties 
-Djava.io.tmpdir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-localDir-nm-0_0/usercache/jenkins/appcache/application_1500396215695_0001/container_1500396215695_0001_01_000001/tmp
 -Xmx96m -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/dt-heap-1.bin
 -Dhadoop.root.logger=DEBUG,RFA 
-Dhadoop.log.dir=/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/com.datatorrent.stram.StramMiniClusterTest-logDir-nm-0_0/application_1500396215695_0001/container_1500396215695_0001_01_000001
 
-Ddt.attr.APPLICATION_PATH=file:/home/jenkins/jenkins-slave/workspace/Apex_Core_PR/engine/target/com.datatorrent.stram.StramMiniClusterTest/testSetupShutdown
 -Dapex.application.name=testApp -Dlog4j.debug=true 
com.datatorrent.stram.StreamingAppMaster
jenkins  13391     1  0 Sep21 ?        00:48:46 
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m 
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter6160994687041676643.jar
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire3219510197920875932tmp
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7295825218010268940443tmp
jenkins  19952     1  0 Sep11 ?        01:01:57 
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m 
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter741259645380147856.jar
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire7882552254579184169tmp
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_7302959699799093808908tmp
jenkins  22540     1  0 Aug30 ?        01:17:34 
/usr/local/jenkins/java/jdk1.7.0_55/jre/bin/java -Xmx2048m -XX:MaxPermSize=768m 
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter2604914348066585301.jar
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire6579405034700130938tmp
 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_730278414035615526144tmp
jenkins  28511     1 64 Sep17 ?        27-10:53:42 
/usr/local/asfpackages/java/jdk1.8.0_144/jre/bin/java -Xmx1024m -da -jar 
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefirebooter7264422334614348257.jar
 
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire1763721606174158064tmp
 
/home/jenkins/jenkins-slave/workspace/PreCommit-OOZIE-Build/core/target/surefire/surefire_3143712379344115217tmp
{code}


> Report and optionally kill stale JVMs between unit test modules
> ---------------------------------------------------------------
>
>                 Key: YETUS-570
>                 URL: https://issues.apache.org/jira/browse/YETUS-570
>             Project: Yetus
>          Issue Type: New Feature
>          Components: Test Patch
>    Affects Versions: 0.6.0
>            Reporter: Allen Wittenauer
>            Assignee: Allen Wittenauer
>         Attachments: YETUS-570.wip.00.patch, YETUS-570.wip.01.patch
>
>
> YETUS-561  does a great job of preventing fork bombs and stale processes from 
> destroying machines.  However, Yetus should do a better job of:
> a) Reporting when that happens
> b) Preventing that from happening



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to