Neutron3529 opened a new pull request #20188:
URL: https://github.com/apache/incubator-mxnet/pull/20188


   I'm trying to avoid the error generated by amp using bfloat16
   
   The error is due to:
   ```
   /me/prog/prog-amp.py:77: UserWarning: All children of this Sequential layer 
'compose1_' are HybridBlocks. Consider using HybridSequential for the best 
performance.
     transform_test.hybridize(static_alloc=True,static_shape=True)
   Traceback (most recent call last):
     File "/me/prog/prog-amp.py", line 359, in <module>
       loss0   = loss_fn(output, label)
     File "/me/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 314, in 
__mul__
       return multiply(self, other)
     File "/me/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 3757, in 
multiply
       return _ufunc_helper(
     File "/me/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 3576, in 
_ufunc_helper
       return fn_array(lhs, rhs)
     File "/me/incubator-mxnet/python/mxnet/contrib/amp/amp.py", line 109, in 
_new_fun
       return f(*args, **kwargs)
     File "<string>", line 52, in broadcast_mul
     File "/me/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 82, in 
_imperative_invoke
       check_call(_LIB.MXImperativeInvokeEx(
     File "/me/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File "/me/incubator-mxnet/src/io/../operator/elemwise_op_common.h", line 
135
   MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in 
node  at 1-th input: expected bfloat16, got float32
   Error in atexit._run_exitfuncs:
   Traceback (most recent call last):
     File "/me/incubator-mxnet/python/mxnet/base.py", line 587, in 
_notify_shutdown
       check_call(_LIB.MXNotifyShutdown())
     File "/me/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File "/me/incubator-mxnet/src/operator/tensor/./amp_cast.h", line 136
   MXNetError: Unknown type enum 12
   ```
   which is tested under mxnet v1.x, but seems also affect v2.0
   
   since 30-series RTX card support bfloat16, there is no need to disable it 
using `#ifndef __NVCC__` explicitly, 
   
   I don't know whether it works, but things could not be worse.
   
   ## Description ##
   (Brief description on what this PR is about)
   
   ## Checklist ##
   ### Essentials ###
   - [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], 
[FEATURE], [DOC], etc)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] Code is well-documented
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to