> As MXNet v1.3 is likely to be used a lot with Cuda 9.2 I believe the default > behavior should be changed to use the bug-free but less efficient Kernel.
It would be crazy to do anything else, to be honest. Its a terrible philosophy to say to users 'you can't rely on MXNet to have correct behaviour on the fastest GPU, rather you need to follow the forums/issues lists in order to know that you need to opt-in to a bug-free implementation'. > On Jul 24, 2018, at 3:47 AM, Leonard Lausen <[email protected]> > wrote: > > Currently the default kernel of nn.Embedding backward is known to be > buggy on P3 instances or using Cuda 9.2 (though the issue also occurs on > other instances with earlier version of Cuda, but less often). > > https://github.com/apache/incubator-mxnet/issues/11314 > > There is currently an opt-in for using a bug-free kernel, but it is not > the default. However, the bug-free kernel is used by default for shape > smaller 16384. > > Should MXNet ship a more efficient but buggy kernel in v1.3 or use a > correct but less efficient kernel by default? As MXNet v1.3 is likely to > be used a lot with Cuda 9.2 I believe the default behavior should be > changed to use the bug-free but less efficient Kernel. Correctness and > providing a good user experience should be No. 1 here (?). Then users > that want a faster but buggy backward kernel can still select to do so. > Note this only affects the backward pass. > > Hao did related work on improving the take operator > https://github.com/apache/incubator-mxnet/pull/11326 > https://github.com/apache/incubator-mxnet/pull/11795 which also fixes > the issue, but he found it to be only "slightly faster" compared to the > bug-free kernel that is currently under opt-in while leading to CI > failures on Windows. > > In my experience, there is no speed difference between the current buggy and > opt-in bug-free kernel, but the GPU utilization of the latter is 100% compared > to 60% of the former (benchmark script: > https://github.com/apache/incubator-mxnet/pull/11795#issuecomment-405808567 )
