ZhaoqiZhu opened a new issue #15069: Profiler RFC: Introducing new APIs
URL: https://github.com/apache/incubator-mxnet/issues/15069
 
 
   # Profiler RFC: Introducing new APIs
   
   ## Introducing New APIs
   
   ### Motivation
   
   MXNet comes with a profiler that allows users to monitor the performance of 
their models in two metrics: time and memory consumption. Internally, operator 
calls, C API calls, and memory allocation/deallocation are represented as 
events. For functions calls, we know the start and finish time of the events 
and therefore the duration. For memory operations, we know the time of the 
allocation/deallocation and the size of the memory chunk. 
   ![Screen Shot 2019-05-24 at 4 16 39 
PM](https://user-images.githubusercontent.com/10722037/58362190-49f4c080-7e49-11e9-92a3-23664384544b.png)
   
   Currently, the profiler has a function called `dumps()` that will return the 
aggregate statistics, which include min, max, and average for entries in Device 
Memory, Operator, and C_API. The current return value is string and the data is 
presented in a table fashion (refer to the screenshot above). However, while 
the table is nicely formatted, it is only meant to be read by humans but is not 
easily parse-able otherwise by program. So, there is a need for an API that 
returns the same aggregate stats in a JSON string.
   
   
   ### Specification
   
   A new API, `get_summary()`, will be introduced. It will have two parameters: 
   
   1. “sort_by” which specifies by which statistic should we sort the entries. 
It defaults to “avg” and valid options are [“min”, “max”, “avg”].
   2. “ascending” which specifies how the entries should be sorted. It defaults 
to False and valid options are [True, False].
   
    Expected use cases of `get_summary()` include:
   
   1. If customers are more interested in some events or stats than the others, 
they can customize the data presentation to more efficiently monitor their 
models.
   2. Customers can easily pass the stats to automated performance tests or 
monitoring tools. They do not need to parse the table-like string returned by 
`dumps()`. 
   3. This new API will be immediately useful to a new operator-level benchmark 
tool that Sandeep and I are working on. cwiki: 
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Operator+Benchmarks. 
   
   
   The structure of the JSON return value is shown below. It is a four layer 
dictionary structure. The 1st layer is “Time”, “Memory”, and “Unit”. The 2nd 
layer is the category that the operators/APIs fall into. The 3rd layer is the 
operators/APIs. Finally, the 4th layer is the stats. Notice that the time unit 
is ms and the memory unit is byte.
   
   ```
   {
       "Time": {
           "operator": {
               "mean ": {
                   "Total Count": 2,
                   "Total Time": 0.0490,
                   "Min Time": 0.0240,
                   "Max Time": 0.0250,
                   "Avg Time": 0.0245
               }
               ...
           }
           ,
           "MXNET_C_API": {
               "MXNDArrayWaitAll": {
                   "Total Count": 1,
                   "Total Time": 205.9560,
                   "Min Time": 205.9560,
                   "Max Time": 205.9560,
                   "Avg Time": 205.9560
               }
               ,
               "MXNDArraySetGradState": {
                   "Total Count": 8,
                   "Total Time": 0.0050,
                   "Min Time": 0.0000,
                   "Max Time": 0.0010,
                   "Avg Time": 0.0006
               }
               ...
           }
       }
       ,
       "Memory": {
           "Device Storage": {
               "Memory: cpu/0 ": {
                   "Count": 1,                
                   "Max Usage": 109037988,
                   "Min Usage": 0,
                   "Avg Usage": 54518999
               }
               ,
               "Memory: gpu/0 ": {
                   "Count": 1, 
                   "Max Usage": 109037988,
                   "Min Usage": 0,
                   "Avg Usage": 54518999
               }
           }
           ,
           "Pool Memory": {
               "Pool:gpu/0 Pool Free": {
                   "Count": 1, 
                   "Max Usage": 1,
                   "Min Usage": 2,
                   "Avg Usage": 3
               }
               ,
               "Pool:gpu/0 Pool Used": {
                   "Count": 1, 
                   "Max Usage": 0,
                   "Min Usage": 1,
                   "Avg Usage": 2
               }
               ...
           }
       }
       "Unit": {
           "Time": "ms",
           "Memory": "byte"        
       }
   }
   ```
   
   
   Asides from `get_summary(),` we will also have another new API, `reset()`, 
which will clear the aggregate statistics up until now. A typical use case is 
like:
   
   ```
   # we don't care what happened before this point
   profiler.reset()
   # model
   profiler.set_state('run')
   run_training_iteration(*next(itr))
   mx.nd.waitall()
   profiler.set_state('stop')
   # end model
   func(profiler.get_summary())
   ```
   
   In a more complex case, suppose we want to use the same profiler to 
benchmark various sections of a model, we can then call `get_summary()` and 
`reset()` at the end of each section or supposedly at the end of a loop neatly 
like:
   
   ```
   # model section 1
   profiler.set_state('run')
   # model code here
   profiler.set_state('stop')
   print(profiler.get_summary())
   profiler.reset()
   
   # model section 2
   profiler.set_state('run')
   # model code here
   profiler.set_state('stop')
   func(profiler.get_summary())
   profiler.reset()
   ```
   
   OR
   
   ```
   # loop through tests functions
   for f in benchmark_tests:
       profiler.set_state('run')
       f()
       mx.nd.waitall()
       profiler.set_state('stop')
       print(profiler.get_summary())
       profiler.reset()
   ```
   
   ## Fixing the Output of Dumps()
   ![Screen Shot 2019-05-23 at 5 23 56 
PM](https://user-images.githubusercontent.com/10722037/58362201-7c062280-7e49-11e9-88ec-3ab102c95795.png)
   Currently labeling in the table is slightly off. For memory-related entries 
the labels should be “Usage” rather than “Time”. The “Time (ms)” column also 
does not make sense for memory entries, so it should be removed for memory 
entries.
   
   The new table labeling should look like:
   `// For time entries`
   `Name    Total Count    Total Time (ms)    Min Time (ms)    Max Time (ms)    
Avg Time (ms)`
   `// For memory entries`
   `Name    Total Count    Min Usage (MB)    Max Usage (MB)    Avg Usage (MB)`
   
   
   ## F&Q
   
   1. Why can we use the current dumps() API?
   
   We can use the current dumps API and basically get the save information, but 
then we need to manually parse the table which is not a good user experience.
   
   1. Why add a new profiler API `get_summary()` in the back-end rather than a 
python parser utility that returns in JSON? 
   
   This is we can use this new API in different languages and make sure the 
return is consistent. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to