houweidong edited a comment on issue #14742: CPU memory leak when running train_yolov3.py URL: https://github.com/apache/incubator-mxnet/issues/14742#issuecomment-486058607 in the docker container: at epoch 0: ``` Filesystem Size Used Avail Use% Mounted on tmpfs 47G 1.2G 46G 3% /dev/shm ``` at epoch 10: ``` Filesystem Size Used Avail Use% Mounted on tmpfs 47G 16G 32G 33% /dev/shm ``` at epoch 16: ``` Filesystem Size Used Avail Use% Mounted on tmpfs 47G 30G 18G 63% /dev/shm ``` at epoch 23 ``` Filesystem Size Used Avail Use% Mounted on tmpfs 47G 43G 3.7G 93% /dev/shm ``` and at epoch 25, the error happened: ``` Traceback (most recent call last): File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 142, in _serve with self._listener.accept() as conn: File "/usr/lib/python3.5/multiprocessing/connection.py", line 455, in accept deliver_challenge(c, self._authkey) File "/usr/lib/python3.5/multiprocessing/connection.py", line 722, in deliver_challenge response = connection.recv_bytes(256) # reject large message File "/usr/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.5/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer ``` it should be the full shm leads to
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
