tpyl commented on pull request #20491:
URL: 
https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-1004978787


   @nswamy @aaronmarkham 
   Looks like unix-gpu (e.g. ubuntu_gpu_cu114) which is bumped up to Ubuntu 
20.04 by this PR (because TensorRT installation is by far the most 
straightforward for TensorRT8 + Ubuntu 20.04) suffers from the same reference 
leak issue that has been flagged for CentOS. 
   
   This could be a clue as to why reference leaks happen? I don't think this is 
is an issue strictly introduced by this PR. 
   
   Similarly, hard to see how the segfault for centos-gpu could be caused by 
the changes in this PR. None of the TensorRT code is compiled in for the 
centos-gpu tests and none of the CentOS CI code is changed. (The only place 
where we build mxnet with -DUSE_TENSORRT=1 is in build_ubuntu_gpu_tensorrt()). 
   
   I would be grateful for any suggestions on how to get the tests passing for 
this PR. 
   ```
   [2022-01-04T06:01:03.508Z] ==================================== ERRORS 
====================================
   [2022-01-04T06:01:03.508Z] _ ERROR at teardown of 
test_np_standard_binary_funcs[lshape5-rshape5-add-add-True-numeric-<lambda>-None--1.0-1.0]
 _
   [2022-01-04T06:01:03.508Z] [gw1] linux -- Python 3.8.10 /usr/bin/python3
   [2022-01-04T06:01:03.508Z] 
   [2022-01-04T06:01:03.508Z] request = <SubRequest 'check_leak_ndarray' for 
<Function 
test_np_standard_binary_funcs[lshape5-rshape5-add-add-True-numeric-<lambda>-None--1.0-1.0]>>
   [2022-01-04T06:01:03.508Z] 
   [2022-01-04T06:01:03.508Z]     @pytest.fixture(autouse=True)
   [2022-01-04T06:01:03.508Z]     def check_leak_ndarray(request):
   [2022-01-04T06:01:03.508Z]         garbage_expected = 
request.node.get_closest_marker('garbage_expected')
   [2022-01-04T06:01:03.508Z]         if garbage_expected:  # Some tests leak 
references. They should be fixed.
   [2022-01-04T06:01:03.508Z]             yield  # run test
   [2022-01-04T06:01:03.508Z]             return
   [2022-01-04T06:01:03.508Z]     
   [2022-01-04T06:01:03.508Z]         if 'centos' in platform.platform():
   [2022-01-04T06:01:03.508Z]             # Multiple tests are failing due to 
reference leaks on CentOS. It's not
   [2022-01-04T06:01:03.508Z]             # yet known why there are more memory 
leaks in the Python 3.6.9 version
   [2022-01-04T06:01:03.508Z]             # shipped on CentOS compared to the 
Python 3.6.9 version shipped in
   [2022-01-04T06:01:03.508Z]             # Ubuntu.
   [2022-01-04T06:01:03.508Z]             yield
   [2022-01-04T06:01:03.508Z]             return
   ````


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to