leezu commented on pull request #18704: URL: https://github.com/apache/incubator-mxnet/pull/18704#issuecomment-678888452
@szha the gpu memory profiler tests have never been run. They were disabled until https://github.com/apache/incubator-mxnet/pull/18701 due to declaration in the wrong file. I'm not sure what the following lines refer to, and if / why they were recorded differently before. As per the comment in the test file, the test was designed to capture 26 lines, but we now capture 33 lines. In particular, the following lines were not recorded before and appear to be the problematic ones. I don't think that recording more events in the memory profiler is related to the name-scope refactor, as the profiler is enabled and disabled separately from the name-scope. ``` [2020-07-15T20:15:02.236Z] <unk>:_random_uniform,67108864,0,67108864,0 [2020-07-15T20:15:02.236Z] <unk>:_random_uniform,67108864,0,67108864,0 [2020-07-15T20:15:02.236Z] <unk>:_zeros,67108864,0,67108864,0 [2020-07-15T20:15:02.236Z] <unk>:_zeros,67108864,0,67108864,0 [2020-07-15T20:15:02.236Z] <unk>:in_arg:data,640,0,640,0 [2020-07-15T20:15:02.236Z] <unk>:unknown,67108864,0,67108864,0 [2020-07-15T20:15:02.236Z] <unk>:unknown,67108864,0,67108864,0 ``` Also note that the profiler scope implementation has some fundamental threading issues. The scope in the backend is not thread local, but the frontend "claimns" thread safety by using a thread local variable in the frontend. There same problem applies to single-threaded but asynchronous code. @ArmageddonKnight the profiler scope is automatically set in Block and Optimizer. Would you be able to debug the issue and fix it in this PR? https://github.com/apache/incubator-mxnet/blob/0de7484884292eb028342b1e5669233792429af0/python/mxnet/gluon/block.py#L951-L952 https://github.com/apache/incubator-mxnet/blob/0de7484884292eb028342b1e5669233792429af0/python/mxnet/optimizer/updater.py#L58-L59 Parameter initialization happens separately from Block, and if you want to set the profiler scope based there, you can add `with _profiler.scope(self._uuid + ':'):` in `_finish_deferred_init` inside `parameter.py`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
