ptrendx commented on issue #17665: No speedup from using FP16 (4 times slower than PyTorch) URL: https://github.com/apache/incubator-mxnet/issues/17665#issuecomment-592734747 Hmm, I tried your code on both V100 and T4 and could not reproduce your problem: on V100 I got: - 0.084 for fp16 - 0.085 for fp16 with multi_precision=True - 0.182 for fp32 on T4 I got: - 0.27 for fp16 - 0.265 for fp16 with multi_precision-True - 0.55 for fp32 BTW - please use `m.hybridize(static_alloc=True, static_shape=True)`, that gives about 10% speed increase for me in this test (so e.g. V100 time is 0.74 after hybridization).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
