Re: Should MXNet 1.3 contain a buggy version of nn.Embedding backward by default?

Naveen Swamy Mon, 23 Jul 2018 21:13:13 -0700

If it is buggy, how does it matter if it is performant or not? I am not
seeing the rationale to make the correct version only opt-in.



On Mon, Jul 23, 2018 at 6:47 PM, Leonard Lausen <[email protected]>
wrote:

> Currently the default kernel of nn.Embedding backward is known to be
> buggy on P3 instances or using Cuda 9.2 (though the issue also occurs on
> other instances with earlier version of Cuda, but less often).
>
> https://github.com/apache/incubator-mxnet/issues/11314
>
> There is currently an opt-in for using a bug-free kernel, but it is not
> the default. However, the bug-free kernel is used by default for shape
> smaller 16384.
>
> Should MXNet ship a more efficient but buggy kernel in v1.3 or use a
> correct but less efficient kernel by default? As MXNet v1.3 is likely to
> be used a lot with Cuda 9.2 I believe the default behavior should be
> changed to use the bug-free but less efficient Kernel. Correctness and
> providing a good user experience should be No. 1 here (?). Then users
> that want a faster but buggy backward kernel can still select to do so.
> Note this only affects the backward pass.
>
> Hao did related work on improving the take operator
> https://github.com/apache/incubator-mxnet/pull/11326
> https://github.com/apache/incubator-mxnet/pull/11795 which also fixes
> the issue, but he found it to be only "slightly faster" compared to the
> bug-free kernel that is currently under opt-in while leading to CI
> failures on Windows.
>
> In my experience, there is no speed difference between the current buggy
> and
> opt-in bug-free kernel, but the GPU utilization of the latter is 100%
> compared
> to 60% of the former (benchmark script:
> https://github.com/apache/incubator-mxnet/pull/11795#
> issuecomment-405808567 )
>

Re: Should MXNet 1.3 contain a buggy version of nn.Embedding backward by default?

Reply via email to