rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?
   
   I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying 
to understand some of the changes you made.
   Here, 
   
https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L140
   Why does softmax input need to be cast to fp32? Is it for precision reasons?
   
   Is that double buffering you mention with identity operator general enough 
to go in as an official guide? 
   
   Thanks for your help :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to