ptrendx commented on issue #13650: speed drop when set `multi_precision=True`
URL: 
https://github.com/apache/incubator-mxnet/issues/13650#issuecomment-448462206
 
 
   It's hard to give a definitive answer without looking at the profile, but 
the most probable explanation is as follows:
    - `local` kvstore uses CPU to perform the reduction and update of the 
parameters. CPU can not natively handle FP16 data (there are instructions in 
Ivy Bridge+ that help with casting to/from fp16, but I'm not sure they would be 
used there) and so that step is very slow
    - therefore the update step when not using multiprecision option is 
probably already pretty close to become a bottleneck in your training. 
Multiprecision version needs to do more work - at the very least read and save 
fp32 copy of weights in the case of SGD which has a special kernel for it, but 
in the case of NAG there is no special kernel, so the overhead is bigger. That 
is probably why you see the impact - the cost can no longer be hidden behind 
the computation on the GPU.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to