[GitHub] [incubator-mxnet] mk-61 opened a new issue #18896: [RFC] Moving MXNet-AMP to core

GitBox Mon, 10 Aug 2020 14:17:09 -0700


mk-61 opened a new issue #18896:
URL: https://github.com/apache/incubator-mxnet/issues/18896



   MXNet already has experimental AMP (Automatic Mixed Precision) support, 
exposed in mxnet.contrib package. It is used for automatic casting models to 
both float16 and bfloat16. This RFC covers moving it into core / making a 
first-class feature, as well as further development.
   
   Here's a rough task break down for the initial move:
   
   * Need to ensure AMP works with numpy ops - i.e., all ops are in either of 
the lists
   * API change: make loss scale public 
(https://github.com/apache/incubator-mxnet/issues/17507)
   * A number of issues has to be resolved to improve user experience:
     1. Cannot load trainer with AMP 
(https://github.com/apache/incubator-mxnet/issues/16858)
     2. There's a CUDA crash (IMA) in amp_multicast, happens on some models 
(Yolo3)
   * The actual shuffling code around and updating import paths
   
   Post move:
   
   1. Layout optimization - upstreaming feature already existing in NVIDIA NGC 
container. This helps convolutions' performance by automatically casting 
between NCHW and NHWC layouts.
   2. Explore alternatives to front end ops monkey-patching 
(https://github.com/apache/incubator-mxnet/issues/18697)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] mk-61 opened a new issue #18896: [RFC] Moving MXNet-AMP to core

Reply via email to