aaronpmishkin opened a new issue #15958: Potential Bug using nd.tile Between Convolutional Layers URL: https://github.com/apache/incubator-mxnet/issues/15958 ## Description I’ve encountered an issue using `mxnet.ndarray.tile` to upscale CNN embeddings so that they may be used as features in another convolutional map. First, embeddings are computed for a context set using a CNN; these embeddings are then tiled to match their original shape, appended as channels to a query set, and passed through another CNN. Cross-posted on the forum [here](https://discuss.mxnet.io/t/potential-bug-using-nd-tile-after-convolutional-layers/4688). ## Environment info (Required) ``` ----------Python Info---------- Version : 3.6.8 Compiler : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final) Build : ('default', 'Dec 29 2018 19:04:46') Arch : ('64bit', '') ------------Pip Info----------- Version : 19.2.1 Directory : /Users/user/virtenvs/venv/lib/python3.6/site-packages/pip ----------MXNet Info----------- /Users/user/virtenvs/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp Version : 1.3.1 Directory : /Users/user/virtenvs/venv/lib/python3.6/site-packages/mxnet Commit Hash : 19c501680183237d52a862e6ae1dc4ddc296305b Library : ['/Users/user/virtenvs/venv/lib/python3.6/site-packages/mxnet/libmxnet.so'] Build features: No runtime build feature info available ----------System Info---------- Platform : Darwin-17.7.0-x86_64-i386-64bit system : Darwin node : 38f9d36df1e1.ant.amazon.com release : 17.7.0 version : Darwin Kernel Version 17.7.0: Sun Jun 2 20:31:42 PDT 2019; root:xnu-4570.71.46~1/RELEASE_X86_64 ----------Hardware Info---------- machine : x86_64 processor : i386 b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz' b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C' b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT MD_CLEAR TSXFA IBRS STIBP L1DF SSBD' b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI' ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0023 sec, LOAD: 0.7636 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0007 sec, LOAD: 0.8339 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0007 sec, LOAD: 0.8172 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0005 sec, LOAD: 0.8144 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0007 sec, LOAD: 0.2943 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0006 sec, LOAD: 0.0722 sec. ``` ## Error Message: ``` --------------------------------------------------------------------------- MXNetError Traceback (most recent call last) <ipython-input-7-793535261150> in <module> ----> 1 loss.asscalar() ~/virtenvs/venv_syne/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asscalar(self) 1988 if self.shape != (1,): 1989 raise ValueError("The current array is not a scalar") -> 1990 return self.asnumpy()[0] 1991 1992 def astype(self, dtype, copy=True): ~/virtenvs/venv_syne/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asnumpy(self) 1970 self.handle, 1971 data.ctypes.data_as(ctypes.c_void_p), -> 1972 ctypes.c_size_t(data.size))) 1973 return data 1974 ~/virtenvs/venv_syne/lib/python3.6/site-packages/mxnet/base.py in check_call(ret) 249 """ 250 if ret != 0: --> 251 raise MXNetError(py_str(_LIB.MXGetLastError())) 252 253 MXNetError: [10:21:09] src/operator/nn/../tensor/broadcast_reduce_op.h:408: Too many reduction axes from [100,1,1,6,2,14,2,14] to [1,1,1,6,1,14,1,14] Stack trace returned 10 entries: [bt] (0) 0 libmxnet.so 0x000000010cd59b90 libmxnet.so + 15248 [bt] (1) 1 libmxnet.so 0x000000010cd5993f libmxnet.so + 14655 [bt] (2) 2 libmxnet.so 0x000000010cd59569 libmxnet.so + 13673 [bt] (3) 3 libmxnet.so 0x000000010cfc7afb libmxnet.so + 2562811 [bt] (4) 4 libmxnet.so 0x000000010d18bdf9 libmxnet.so + 4414969 [bt] (5) 5 libmxnet.so 0x000000010e05fd05 libmxnet.so + 19963141 [bt] (6) 6 libmxnet.so 0x000000010e2ae9fd MXNDListFree + 561325 [bt] (7) 7 libmxnet.so 0x000000010e235904 MXNDListFree + 65460 [bt] (8) 8 libmxnet.so 0x000000010e237fd8 MXNDListFree + 75400 [bt] (9) 9 libmxnet.so 0x000000010e23b371 MXNDListFree + 88609 ``` ## Minimum reproducible example ``` ## Setup ## import numpy as np import mxnet as mx import mxnet.gluon as gluon import mxnet.gluon.nn as nn embedding_block = nn.Sequential() # make a small CNN to embedd the "context" embedding_block.add(nn.Conv2D(channels=6, kernel_size=5, strides=1, activation='relu', padding=(2,2))) embedding_block.add(nn.AvgPool2D(pool_size=(2,2), strides=2)) # make a CNN classifier for the query set query_block = nn.Sequential() query_block.add(nn.Conv2D(channels=6, kernel_size=5, strides=1, activation='relu')) query_block.add(nn.AvgPool2D(pool_size=(2,2), strides=2)) query_block.add(nn.Dense(units=1)) embedding_block.collect_params().initialize() query_block.collect_params().initialize() ## Data Generation ## # create a simple squared loss problem w = np.random.normal(size=(28,28)) # features should be multi-channel images features = np.random.normal(size=(200, 6, 28, 28)) temp = np.sum(w * features, axis=(1,2,3)) targets = np.sign(np.add(temp[:100], temp[100:])) context_features = mx.nd.array(features[:100]) query_features = mx.nd.array(features[100:]) targets = mx.nd.array(targets) ## Data Generation ## loss = 0. with mx.autograd.record(): # Add features via nd.tile context_embedding = mx.nd.sum(embedding_block(context_features), axis=0) channel = context_embedding.tile((100, 1, 2, 2)) # append new channel to image features task_features = mx.nd.concat(query_features, channel) preds = query_block(task_features) loss = loss + mx.nd.sum(mx.nd.square(mx.nd.squeeze(preds) - targets)) loss.backward() loss.asscalar() ``` ## Steps to reproduce (Paste the commands you ran that produced the error.) 1. The error is thrown by the above code when `loss.asscalar()` is called. ## What have you tried to solve it? 1. As far as I know, this error is only thrown when `context_embedding` is computed using convolutional layers. I initially tried `nd.tile` on a pre-defined nd.array and was not able to replicate the issue. 2. The error is also not thrown if the context embeddings are not tiled (e.g. when they are already the same size as the query image channels).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
