ptrendx commented on issue #17665: No speedup from using FP16 (4 times slower 
than PyTorch)
URL: 
https://github.com/apache/incubator-mxnet/issues/17665#issuecomment-592734747
 
 
   Hmm, I tried your code on both V100 and T4 and could not reproduce your 
problem:
   on V100 I got:
    - 0.084 for fp16
    - 0.085 for fp16 with multi_precision=True
    - 0.182 for fp32
   on T4 I got:
    - 0.27 for fp16
    - 0.265 for fp16 with multi_precision-True
    - 0.55 for fp32
   
   BTW - please use `m.hybridize(static_alloc=True, static_shape=True)`, that 
gives about 10% speed increase for me in this test (so e.g. V100 time is 0.74 
after hybridization).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to