caishanli opened a new issue #17890: ndarray.cc:640 Check failed: !is_view
URL: https://github.com/apache/incubator-mxnet/issues/17890
 
 
   ## Description
   I install anaconda and create two venvs: mxnet-cu102 and mxnet-cu102mkl.
   with mxnet-cu102mkl, code in reproduce fail, but ok in mxnet-cu102
   
   ### Error Message
   ```
   Traceback (most recent call last):
     File "test_mkl.py", line 34, in <module>
       mx.nd.waitall()
     File 
"/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py",
 line 200, in waitall
       check_call(_LIB.MXNDArrayWaitAll())
     File 
"/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/base.py",
 line 255, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [13:47:16] src/ndarray/ndarray.cc:640: Check failed: 
!is_view: 
   Stack trace:
     [bt] (0) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6d554b)
 [0x7fd69a35c54b]
     [bt] (1) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::GetMKLDNNData()
 const+0x13c) [0x7fd69d7565cc]
     [bt] (2) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::op::MKLDNNTransposeForward(nnvm::NodeAttrs
 const&, mxnet::OpContext const&, mxnet::NDArray const&, mxnet::OpReqType 
const&, mxnet::NDArray const&)+0x408) [0x7fd69a411348]
     [bt] (3) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x36232d4)
 [0x7fd69d2aa2d4]
     [bt] (4) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::imperative::PushFComputeEx(std::function<void
 (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, 
nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, 
std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, 
std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, 
std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > 
const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) 
const+0x3c4) [0x7fd69d5fdf74]
     [bt] (5) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3896084)
 [0x7fd69d51d084]
     [bt] (6) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38a4091)
 [0x7fd69d52b091]
     [bt] (7) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38a74e4)
 [0x7fd69d52e4e4]
     [bt] (8) 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38a27e4)
 [0x7fd69d5297e4]
   ```
   
   ## To Reproduce
   The test code is a mini version from gluoncv's train_yolo_v3.py, as below:
   ```
   import warnings
   import mxnet as mx
   from mxnet import gluon
   import gluoncv as gcv
   gcv.utils.check_version('0.6.0')
   from gluoncv import data as gdata
   from gluoncv.data.transforms.presets.yolo import YOLO3DefaultTrainTransform
   from gluoncv.model_zoo.yolo import get_yolov3
   from gluoncv.model_zoo.mobilenet import get_mobilenet
   
   classes = [str(i) for i in range(67)]
   
   ctx = mx.cpu()
   batch_size = 2
   width, height = 320, 320
   
   base_net = get_mobilenet(multiplier=1, pretrained=True)
   stages = [base_net.features[:33], base_net.features[33:69], 
base_net.features[69:-2]]
   anchors = [[10, 13, 16, 30, 33, 23],
               [30, 61, 62, 45, 59, 119],
               [116, 90, 156, 198, 373, 326]]
   strides = [8, 16, 32]
   net = get_yolov3('mobilenet1.0', stages, [512, 256, 128], anchors, strides, 
classes, 'voc', pretrained=False)
   with warnings.catch_warnings(record=True) as w:
       warnings.simplefilter("always")
       net.initialize(ctx=ctx)`
   
   train_dataset = gdata.VOCDetection()
   train_loader = gluon.data.DataLoader(
       train_dataset.transform(YOLO3DefaultTrainTransform(width, height, net)), 
batch_size, True)
   for epoch in range(10):
       mx.nd.waitall()
       for i, data in enumerate(train_loader):
           print('ok')
   ```
   
   ### Steps to reproduce
   #### conda venv: mxnet-cu102
   1.set classes to 67 ok
   2.set classes to 65, 66, 68, 69 ok
   
   #### conda venv: mxnet-cu102mkl
   1.set classes to 67 fail
   2.set classes to 65, 66, 68, 69 ok
   
   ## Environment
   
   We recommend using our script for collecting the diagnositc information. Run 
the following command and paste the outputs below:
   ```
   curl --retry 10 -s 
https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | 
python
   
   ----------Python Info----------
   Version      : 3.7.6
   Compiler     : GCC 7.3.0
   Build        : ('default', 'Jan  8 2020 19:59:22')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 20.0.2
   Directory    : 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.6.0
   Directory    : 
/home/caishanli/.conda/envs/mxnet_mkl/lib/python3.7/site-packages/mxnet
   Num GPUs     : 2
   Commit Hash   : 6eec9da55c5096079355d1f1a5fa58dcf35d6752
   ----------System Info----------
   Platform     : Linux-5.5.9-arch1-2-x86_64-with-arch-Arch-Linux
   system       : Linux
   node         : archlinux
   release      : 5.5.9-arch1-2
   version      : #1 SMP PREEMPT Thu, 12 Mar 2020 23:01:33 +0000
   ----------Hardware Info----------
   machine      : x86_64
   processor    : 
   架构:                           x86_64
   CPU 运行模式:                   32-bit, 64-bit
   字节序:                         Little Endian
   Address sizes:                   39 bits physical, 48 bits virtual
   CPU:                             8
   在线 CPU 列表:                  0-7
   每个核的线程数:                 2
   每个座的核数:                   4
   座:                             1
   NUMA 节点:                      1
   厂商 ID:                        GenuineIntel
   CPU 系列:                       6
   型号:                           60
   型号名称:                       Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
   步进:                           3
   CPU MHz:                        3479.679
   CPU 最大 MHz:                   3900.0000
   CPU 最小 MHz:                   800.0000
   BogoMIPS:                       6787.05
   虚拟化:                         VT-x
   L1d 缓存:                       128 KiB
   L1i 缓存:                       128 KiB
   L2 缓存:                        1 MiB
   L3 缓存:                        8 MiB
   NUMA 节点0 CPU:                 0-7
   Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
   Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional 
cache flushes, SMT vulnerable
   Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT 
vulnerable
   Vulnerability Meltdown:          Mitigation; PTI
   Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass 
disabled via prctl and seccomp
   Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and 
__user pointer sanitization
   Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB 
conditional, IBRS_FW, STIBP conditional, RSB filling
   Vulnerability Tsx async abort:   Not affected
   标记:                           fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
                                     syscall nx pdpe1gb rdtscp lm constant_tsc 
arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf p
                                    ni pclmulqdq dtes64 monitor ds_cpl vmx smx 
est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
                                     tsc_deadline_timer aes xsave avx f16c 
rdrand lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shado
                                    w vnmi flexpriority ept vpid ept_ad 
fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat 
pln p
                                    ts md_clear flush_l1d
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0069 
sec, LOAD: 4.6050 sec.
   Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0013 
sec, LOAD: 2.4630 sec.
   Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0437 sec, LOAD: 
2.5802 sec.
   Timing for D2L: http://d2l.ai, DNS: 0.0399 sec, LOAD: 1.0459 sec.
   Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0392 sec, LOAD: 0.9735 sec.
   Timing for FashionMNIST: 
https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, 
DNS: 0.3429 sec, LOAD: 2.1664 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.5987 sec, LOAD: 
7.1233 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0376 sec, 
LOAD: 1.3676 sec.
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to