houweidong commented on issue #14742: CPU memory leak when running 
train_yolov3.py
URL: 
https://github.com/apache/incubator-mxnet/issues/14742#issuecomment-485638529
 
 
   ps aux --sort -rss | head -20
   ```
   USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
   root      7551  436 13.3 134231992 13047996 pts/0 Sl+ 11:28 122:55 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8016 93.1  7.5 12621720 7394328 pts/0 Sl+ 11:29  25:22 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7967 91.7  7.5 12631112 7379188 pts/0 Sl+ 11:29  25:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8034 91.9  7.5 12604824 7378900 pts/0 Sl+ 11:29  25:02 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7988 92.0  7.5 12605612 7378328 pts/0 Sl+ 11:29  25:05 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7997 91.9  7.5 12604048 7377076 pts/0 Rl+ 11:29  25:03 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8025 92.1  7.5 12573304 7345996 pts/0 Sl+ 11:29  25:06 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7979 92.1  7.5 12570536 7344564 pts/0 Sl+ 11:29  25:06 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8007 92.4  7.5 12562104 7334816 pts/0 Sl+ 11:29  25:11 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8076  0.0  7.4 12646640 7315760 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8085  0.0  7.4 12646640 7315760 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8094  0.0  7.4 12646640 7315760 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8103  0.0  7.4 12646640 7315760 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8115  0.0  7.4 12646640 7315760 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8058  0.0  7.4 12646640 7315756 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8067  0.0  7.4 12646640 7315756 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8049  0.0  7.4 12646640 7315752 pts/0 Sl+ 11:29   0:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root     10965  2.5  1.5 12241692 1481400 pts/0 Sl+ 4月05 639:09 
/root/pycharm/jre64/bin/java -classpath 
/root/pycharm/lib/bootstrap.jar:/root/pycharm/lib/extensions.jar:/root/pycharm/lib/util.jar:/root/pycharm/lib/jdom.jar:/root/pycharm/lib/log4j.jar:/root/pycharm/lib/trove4j.jar:/root/pycharm/lib/jna.jar
 -Xms128m -Xmx750m -XX:ReservedCodeCacheSize=240m -XX:+UseConcMarkSweepGC 
-XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false 
-Djava.net.preferIPv4Stack=true -Djdk.http.auth.tunneling.disabledSchemes="" 
-XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow 
-Dawt.useSystemAAFontSettings=lcd 
-Dsun.java2d.renderer=sun.java2d.marlin.MarlinRenderingEngine 
-XX:ErrorFile=/root/java_error_in_PYCHARM_%p.log 
-XX:HeapDumpPath=/root/java_error_in_PYCHARM.hprof 
-Didea.paths.selector=PyCharmCE2018.3 
-Djb.vmOptionsFile=/root/pycharm/bin/pycharm64.vmoptions 
-Didea.platform.prefix=PyCharmCore com.intellij.idea.Main
   erised   35435  5.0  0.9 3785860 974604 ?      Sl   4月09 993:36 
/opt/teamviewer/tv_bin/TeamViewer_Desktop
   
   ```
   this is at epoch 5, CPUmemory occupation is about 30G
   
   
......................................................................................................................................................................................................................
   
......................................................................................................................................................................................................................
   
......................................................................................................................................................................................................................
   
   
   ps aux --sort -rss | head -20
   ```
   USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
   root      7551  438 13.5 129978316 13223692 pts/0 Sl+ 11:28 244:38 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7997 90.6  7.5 12600784 7374952 pts/0 Sl+ 11:29  49:44 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8025 91.1  7.5 12590040 7364248 pts/0 Sl+ 11:29  50:00 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7967 90.7  7.5 12638704 7356276 pts/0 Sl+ 11:29  49:47 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7979 91.3  7.5 12573112 7346988 pts/0 Sl+ 11:29  50:06 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8034 90.7  7.5 12572472 7346384 pts/0 Sl+ 11:29  49:47 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      7988 90.8  7.5 12618464 7346056 pts/0 Sl+ 11:29  49:52 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8007 90.9  7.5 12571272 7345172 pts/0 Sl+ 11:29  49:54 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8016 91.3  7.5 12570272 7344144 pts/0 Sl+ 11:29  50:09 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8094  0.3  7.5 12646896 7325600 pts/0 Sl+ 11:29   0:10 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8103  0.3  7.5 12646896 7325600 pts/0 Sl+ 11:29   0:10 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8049  0.3  7.5 12646896 7325596 pts/0 Sl+ 11:29   0:10 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8058  0.3  7.5 12646896 7325596 pts/0 Sl+ 11:29   0:10 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8085  0.3  7.5 12646896 7325596 pts/0 Sl+ 11:29   0:10 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8115  0.3  7.5 12646896 7325596 pts/0 Sl+ 11:29   0:10 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8076  0.2  7.5 12646896 7325592 pts/0 Sl+ 11:29   0:09 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root      8067  0.3  7.5 12646896 7325588 pts/0 Sl+ 11:29   0:10 
/usr/bin/python3.5 /root/models/yolov3_origin/train_yolo3.py --gpus 1,2 
--network darknet53 --syncbn --batch-size 16 -j 8 --val-interval 10
   root     10965  2.5  1.5 12241692 1482248 pts/0 Sl+ 4月05 640:07 
/root/pycharm/jre64/bin/java -classpath 
/root/pycharm/lib/bootstrap.jar:/root/pycharm/lib/extensions.jar:/root/pycharm/lib/util.jar:/root/pycharm/lib/jdom.jar:/root/pycharm/lib/log4j.jar:/root/pycharm/lib/trove4j.jar:/root/pycharm/lib/jna.jar
 -Xms128m -Xmx750m -XX:ReservedCodeCacheSize=240m -XX:+UseConcMarkSweepGC 
-XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false 
-Djava.net.preferIPv4Stack=true -Djdk.http.auth.tunneling.disabledSchemes="" 
-XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow 
-Dawt.useSystemAAFontSettings=lcd 
-Dsun.java2d.renderer=sun.java2d.marlin.MarlinRenderingEngine 
-XX:ErrorFile=/root/java_error_in_PYCHARM_%p.log 
-XX:HeapDumpPath=/root/java_error_in_PYCHARM.hprof 
-Didea.paths.selector=PyCharmCE2018.3 
-Djb.vmOptionsFile=/root/pycharm/bin/pycharm64.vmoptions 
-Didea.platform.prefix=PyCharmCore com.intellij.idea.Main
   erised   35435  5.0  1.0 3785860 990668 ?      Sl   4月09 999:05 
/opt/teamviewer/tv_bin/TeamViewer_Desktop
   
   ```
   this is at epoch 10, CPUmemory occupation is about 40G

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to