ArmageddonKnight edited a comment on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-495481289 Hi @anirudh2290 , Thanks for your valuable feedback. Let me record a list of Discussion and TODO items: ## Discussion - [x] Preprocessor Directives (1. 2.). - The reason why I included the GPU memory profiler as header flags rather than compilation flags is because in the former implementation, when users decide to switch on/off the GPU memory profiler, **the build system can automatically help them figure out the build dependency**, whereas in the latter implementation they need to compile the entire MXNet from scratch. - [ ] Separate Preprocessor Directives (3.). - As their names indicate, **storage tagging** and **GPU memory profiling** are actually two separate things - **storage tagging** is equally applicable to CPU memory profiling (or other devices) while **GPU memory profiling** is only an example that benefits from it. That is why I keep them separate. - I would like to propose **PERMANENTLY adding storage tagging** to the C++ backend, as I believe it might be a useful feature in the future and I do not think propagating an extra string from the Python frontend to the C++ backend will hurt performance because: (1) most strings are small (2) the storage tagging only happens once at the beginning of the training. - [ ] Imperative Support (4.). - We should **NOT** drop the support for imperative, as this will significantly boost the amount of GPU memory allocated with the unknown tag. For instance, currently most optimizer states are allocated using the pure imperative approach. In fact, almost all the current optimizer implementations (e.g., SGD, Adam) initialize the optimizer states with `mx.nd.zero`. If we drop the imperative support, then all of those allocations will fall to the `unknown` category, which can be `1 GB` for large models. - [ ] Profiler API Integration (5. 6. 7. 8.) - The GPU memory profiler is different from the existing profilers in many ways: (1) It is not using the `chrome://tracing` visualization backend, and the reason is because it needs to accept users' input on defining the **keyword dictionaries for grouping storage tags** (also, I do not see a very good way of visualizing bar charts using `chrome://tracing`). (2) Because it requires users' input, the users must first look into the memory profiler logs to see what contributes most of the memory footprint, and this is why those logs are stored as `.csv` because they need to be first digested by the users. (3) Current profiler API designs are more geared towards performance profiling, which in my opinion is different from storage profiling (e.g., in storage profiling, you do not really need the `pause()` and the `resume()` API call). - Based on these, I decided to keep the current GPU memory profiler from the existing profiler APIs because I do not see a very good way of integrating them. ## TODO - [ ] Add minimum working example in the `example` directory to show how `SETME`, analyzer, and plotter work (perhaps use other MXNet examples as a reference). - [ ] (9.) Add a CI stage with the build flag enabled.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
