charlieyou opened a new issue #15166: asnumpy() fails on float16 gradient URL: https://github.com/apache/incubator-mxnet/issues/15166 ## Description asnumpy() fails on float16 gradient both on cpu and gpu contexts. Interestingly, when printing out the variable itself (data16 in the MRE), it works on cpu, but only every other time on gpu. ## Environment info (Required) ``` ----------Python Info---------- Version : 3.6.5 Compiler : GCC 7.2.0 Build : ('default', 'Apr 29 2018 16:14:56') Arch : ('64bit', '') ------------Pip Info----------- Version : 10.0.1 Directory : /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/pip ----------MXNet Info----------- Version : 1.5.0 Directory : /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet Commit Hash : 134a3e8cd36ee66426deedd3c8add6888378c043 ----------System Info---------- Platform : Linux-4.14.114-82.97.amzn1.x86_64-x86_64-with-glibc2.9 system : Linux node : ip-10-10-82-87 release : 4.14.114-82.97.amzn1.x86_64 version : #1 SMP Sun Apr 28 07:27:43 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Stepping: 1 CPU MHz: 2701.438 BogoMIPS: 4600.07 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-3 ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0017 sec, LOAD: 0.6884 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1339 sec, LOAD: 0.3958 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1478 sec, LOAD: 0.4110 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0270 sec, LOAD: 0.5201 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0032 sec, LOAD: 0.1016 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0016 sec, LOAD: 0.0433 sec. ``` Package used (Python/R/Scala/Julia): Python ## Error Message: ``` --------------------------------------------------------------------------- MXNetError Traceback (most recent call last) <ipython-input-64-e9fe00ede208> in <module>() 12 test16.backward() 13 ---> 14 data16.grad.asnumpy() ~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asnumpy(self) 1994 self.handle, 1995 data.ctypes.data_as(ctypes.c_void_p), -> 1996 ctypes.c_size_t(data.size))) 1997 return data 1998 ~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret) 251 """ 252 if ret != 0: --> 253 raise MXNetError(py_str(_LIB.MXGetLastError())) 254 255 MXNetError: [20:45:22] src/operator/tensor/./la_op.h:616: This operation only supports 32-bit and 64-bit floating point Stack trace: [bt] (0) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4ac1eb) [0x7f477e2eb1eb] [bt] (1) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x22e3152) [0x7f4780122152] [bt] (2) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x307) [0x7f478048df47] [bt] (3) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x259adf4) [0x7f47803d9df4] [bt] (4) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25a8789) [0x7f47803e7789] [bt] (5) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25aba24) [0x7f47803eaa24] [bt] (6) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25a6f94) [0x7f47803e5f94] [bt] (7) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/zmq/backend/cython/../../../../.././libstdc++.so.6(+0xb86d4) [0x7f47d4f336d4] [bt] (8) /lib64/libpthread.so.0(+0x7de5) [0x7f47e1fcade5] ``` ## Minimum reproducible example (If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.) ``` import numpy as np from mxnet import autograd def test_func(a): q, l = mx.nd.linalg.gelqf(a) return mx.nd.sum(l) data16 = mx.nd.random.normal(shape=(2, 3), ctx=mx.gpu(), dtype=np.float16) data16.attach_grad() with autograd.record(): test16 = test_func(data16) test16.backward() data16.asnumpy() # this works on cpu, but only half the time on gpu data16.grad.asnumpy() # this fails on both test16.asnumpy() # this fails too ``` ## Steps to reproduce (Paste the commands you ran that produced the error.) Run ^ ## What have you tried to solve it? N/A
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
