[GitHub] [incubator-mxnet] szha commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

GitBox Sun, 26 May 2019 21:35:14 -0700

szha commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496074204
 
 
   > NONE of the existing Python frontend API (e.g., LSTMCell) ... are having 
this profiler scope information
   
   As mentioned before, for Gluon Blocks such as LSTMCell, we can use the class 
name as the default scopes so that users don't need to manually mark them.
   
   > and applications (e.g., Sockeye, MXNet Examples)
   
   I assume this points to the symbolic API examples. I think it's safe to not 
worry too much about that for now, because 1) we recommend users to use 
HybridBlocks instead of using symbolic API directly. 2) code base such as 
sockeye are migrating to Gluon API.
   
   > In terms of depth ... we need to modify the nnvm codebase to achieve this
   
   Even if we are to implement this for symbolic API, setting additional 
attribute in MXNet C API should suffice for recording the identifiers without 
any change to nnvm. If solely rely on symbol names, there's no guarantee that 
the name is consciously chosen or recognizable to the users, as it's common 
practice to only name the input variables.
   
   > Although this information can be recovered by gradually refining the 
profiler scope, it also requires multiple run of the application to find the 
appropriate granularity, which can be really tedious for applications such as 
Sockeye that spend large amount of time in data preprocessing.
   
   Given that the data preprocessing involves mostly NDArray, using scope be 
favorable compared to naming every NDArray. It's `O(# of scopes)` vs `O(# of 
arrays)` even if comparing just the manual work.
   
   > This is why I suggest we should instead go for the "measure once, 
aggregate multiple times" approach.
   
   The alternative approach for naming the identifier makes each variable 
uniquely identifiable, which is what can be achieved in the current PR's 
approach only when users name the variables carefully. Also, the hierarchy 
information in the profiler dump gives strictly more information about the 
structure of the program. Finally, there's nothing preventing users from 
changing the aggregation level if that's desirable, and more intelligent 
options should be possible rather than asking for the mapping for each name 
from the users.
   
   And finally, based on the answers above, here's how I would compare these 
approaches:
   
   |  | Proposed Solution | Current Solution |
   | --- | --- | --- |
   | Code Changes | All existing Application Codebase | ~`SETME.py` (usually 
3-5 lines of code)~<br/> - all imperative code bases on the order of `O(# of 
arrays)`.<br/> - plus manual mapping of all array names for different 
aggregation levels, again on the order of `O(# of arrays)` |
   | Profiler Scope | ~Fixed. Need to rerun the application for a more coarse 
or fine-grained profiler scope.~<br/> - no need to rerun. the only thing 
changed was how the names are assigned, which doesn't prevent logging.<br/>- 
hierarchical information is feasible and already exists in Gluon blocks.<br/> - 
different aggregation levels can be used if so desired. |  Flexible, keyword 
dictionary can be redefined multiple times in `SETME.py`. No need to rerun the 
application since the profiler log is present. (@szha: yes, at the cost of `O(# 
of array)` both in terms of labor and in terms of human memory.) |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] szha commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

Reply via email to