cjolivier01 commented on issue #8972: Profiling enhancements, python API, vtune and chrome tracing objects, etc. URL: https://github.com/apache/incubator-mxnet/pull/8972#issuecomment-360640251 Added aggregate stats and output: Profile Statistics. Note that counter items are counter values and not time units. Device Storage ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- Memory: gpu/0 1132 1115334.8750 7307.2642 1147775.1250 570233.9375 Memory: cpu/0 33 1172000.0000 892000.0000 2964000.0000 1036000.0000 MXNET_C_API ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- MXNDArraySyncCopyFromCPU 4 899.2560 2.0720 447.8010 224.8140 MXNDArrayGetDType 518 1.3230 0.0010 0.0170 0.0026 MXAutogradMarkVariables 6 1.6960 0.1150 0.4470 0.2827 MXNDArrayGetStorageType 8 0.0760 0.0060 0.0160 0.0095 MXInvokeCachedOpEx 240 0.8480 0.0020 0.0140 0.0035 MXAutogradBackwardEx 120 806.8430 4.7310 34.6040 6.7237 MXNet C API Call Count 7485 7.4850 0.0010 7.4850 3.7420 MXAutogradSetIsTraining 250 0.5040 0.0010 0.0160 0.0020 MXNDArrayCreateEx 254 5.9880 0.0140 0.5370 0.0236 MXImperativeInvokeEx 1005 277.0570 0.1110 5.8090 0.2757 MXNDArrayGetShape 1375 4.3920 0.0010 0.0260 0.0032 MXAutogradSetIsRecording 250 0.5830 0.0010 0.0190 0.0023 MXSymbolCreateAtomicSymbol 24 1.4360 0.0290 0.1320 0.0598 MXNet C API Concurrency Count 14969 0.0010 0.0000 0.0010 0.0005 MXNDArraySlice 242 2.7200 0.0060 0.1010 0.0112 MXNDArrayFree 733 4.8370 0.0010 0.7430 0.0066 MXNDArraySyncCopyToCPU 243 4610.6108 0.5690 3066.0559 18.9737 MXNDArrayGetContext 502 1.3600 0.0010 0.0180 0.0027 MXSymbolInferShape 1 0.4760 0.4760 0.4760 0.4760 MXNDArrayGetGradState 720 1.8250 0.0010 0.0160 0.0025 MXSymbolSetAttr 27 0.2330 0.0050 0.0330 0.0086 MXNDArraySetGradState 720 2.3990 0.0020 0.0130 0.0033 MXCreateCachedOpEx 2 8.2910 1.4850 6.8060 4.1455 MXInvokeCachedOp 240 640.5820 1.5290 20.0970 2.6691 operator ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- sum 2 1.7350 0.8570 0.8780 0.8675 ImperativeBulk 174 4202.4258 1.0290 55.4100 24.1519 gather_nd 4 1195.9850 209.3560 388.6370 298.9962 SetValueOp 232 1243.8170 0.0920 21.7190 5.3613 _zeros 36 93.1100 0.0350 25.0160 2.5864 DeleteVariable 1908 272.0820 0.0020 60.0490 0.1426 ResourceParallelRandomSetSeed 2 434.1120 217.0490 217.0630 217.0560 stack 4 161.8350 34.1780 46.7460 40.4588 WaitForVar 476 7.3700 0.0040 0.0640 0.0155 zeros_like 12 7.9950 0.2770 1.2240 0.6662 CopyCPU2CPU 4 960.8060 213.1390 267.2640 240.2015 _random_uniform 6 745.5370 35.7100 228.7330 124.2562 SyncCopyGPU2CPU 468 530.7150 0.1000 4.7960 1.1340 adam_update 204 427.4140 0.2200 7.3630 2.0952 CopyCPU2GPU 478 1044.4160 0.3090 61.7840 2.1850 _full 6 0.4640 0.0580 0.1030 0.0773
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
