KellenSunderland commented on issue #14159: [Feature Request] Support fp16 for 
c++ api
URL: 
https://github.com/apache/incubator-mxnet/issues/14159#issuecomment-485281091
 
 
   I can give a quick summary of my experience with fp16 and C++ so far: 
   
   I believe the best way to do this is as in other front-ends such as python.  
As @anirudh2290 mentions, add casts around numerically sensitive and 
insensitive sections (batchnorms and softmaxs for example should be fp32, convs 
and FC should be fp16) and then make sure your inputs are in fp16.  You should 
then be able to run inference as normal (and you should see that  fp16 
operations are properly running).  
   
   One caveat is that depending on your graph the time spent casting inputs may 
be more than the time you save using fp16.  That's where AMP and TensorRT 
integration can help.  They'll fuse many operators that are numerically 
sensitive, removing them from the computation graph, which means you'll get 
larger sections of a graph that you can run in fp16 mode.  They'll also fuse 
casting operations into numerical operations which saves you from doing two 
full memory copies on your tensors when casting.  These methods should be a 
much more practical way of running fp16 for inference (with C++).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to