maybeLee opened a new issue #20805:
URL: https://github.com/apache/incubator-mxnet/issues/20805
## Description
I guess this is caused by the incorrect size inference of variable
`L[!Ta].size()` or `R[Tb].size()` in `src/operator/tensor/./dot-inl.h:1241` but
not very sure.
I was constructing a model using Keras-Mxnet as the high-level API, the
model information is as follows:
```
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_12 (InputLayer) (None, 28, 28, 3) 0
_________________________________________________________________
average_pooling2d_8 (Average (None, 28, 28, 3) 0
_________________________________________________________________
flatten_11 (Flatten) (None, 2352) 0
_________________________________________________________________
dense_11 (Dense) (None, 100) 235300
=================================================================
Total params: 235,300
Trainable params: 235,300
Non-trainable params: 0
_________________________________________________________________
```
But face the error:
```
MXNetError: MXNetError: Error in operator dot16: [06:11:14]
../src/operator/tensor/./dot-inl.h:1241: Check failed: L[!Ta].Size() ==
R[Tb].Size() (2523 vs. 2352) : dot shape error: [10,2523] X [2352,100]
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/mxnet/symbol/symbol.py in
simple_bind(self, ctx, grad_req, type_dict, stype_dict, group2ctx,
shared_arg_names, shared_exec, shared_buffer, **kwargs)
1942 error_msg += "%s: %s\n" % (k, v)
1943 error_msg += "%s" % e
-> 1944 raise RuntimeError(error_msg)
1945
1946 # update shared_buffer
RuntimeError: simple_bind error. Arguments:
/input_121: (10, 28, 28, 3)
MXNetError: Error in operator dot16: [06:11:14]
../src/operator/tensor/./dot-inl.h:1241: Check failed: L[!Ta].Size() ==
R[Tb].Size() (2523 vs. 2352) : dot shape error: [10,2523] X [2352,100]
```
After checking the source code in `./dot-inl.h` I notice that the
L[!Ta].Size() and R[Tb].Size() is inconsistent which cause such an issue.
However, the output shape of Flatten layer should be 2352 instead of 2523.
That's why I guess the inference of `L[!Ta].size()` may be wrong.
## To Reproduce
```
import os
os.environ["KERAS_BACKEND"] = "mxnet"
import keras
weight=28
x = keras.layers.Input((weight, weight,3))
layer_stack = [
keras.layers.AveragePooling2D(pool_size=[2,2], strides=1,
padding="same"),
# keras.layers.AveragePooling1D(pool_size=1, strides=1,
padding="same", data_format="channels_last"),
keras.layers.Flatten(),
keras.layers.Dense(100),
]
temp = x
for layer in layer_stack:
y = layer(temp)
temp = y
model = keras.Model(x,y)
model.summary()
import numpy as np
input = np.random.rand(10,weight, weight,3)
res = model.predict(input)
```
### Steps to reproduce
Just run the above code with Keras-Mxnet or access the link:
https://colab.research.google.com/drive/1i-gOcbdah1-S_Hdk5_xme0u0CsRmL2Le?usp=sharing
## What have you tried to solve it?
I haven't figured out how to fix this issue yet but I can share my
experience as follows:
1. This bug exists in the latest MXNet release (1.9.1).
2. This bug is triggered when we set `strides=1` and `pool_size` to be even
when using the `AveragePooling2D` layer. And the `Flatten` and `Dense` layers
are necessary for triggering such a crash.
3. I check the `pooling_convention` that Keras-Mxnet sent to MXNet's
`mx.sym.Pooling`, it is `(1,1)` which is consistent with `padding=same` so I
guess Keras-Mxnet is not the one to blame.
In conclusion, I guess it is caused by incorrect size inference of the
`DotShape` function in `src/operator/tensor/./dot-inl.h` file, specifically, it
is possibly caused by these four branches:
```
src/operator/tensor/./dot-inl.h:1221
if (Ta) {
L[0] = mshadow::Shape1(lshape[0]);
L[1] = lshape.ndim() > 1 ?
mxnet::TShape(&lshape[1], lshape.end()) : mxnet::TShape(1, 1);
} else {
L[0] = lshape.ndim() > 1 ?
mxnet::TShape(&lshape[0], &lshape[lshape.ndim()-1]) :
mxnet::TShape(1, 1);
L[1] = mshadow::Shape1(lshape[lshape.ndim()-1]);
}
if (Tb) {
R[0] = rshape.ndim() > 1 ?
mxnet::TShape(&rshape[0], &rshape[rshape.ndim()-1]) :
mxnet::TShape(1, 1);
R[1] = mshadow::Shape1(rshape[rshape.ndim()-1]);
} else {
R[0] = mshadow::Shape1(rshape[0]);
R[1] = rshape.ndim() > 1 ?
mxnet::TShape(&rshape[1], rshape.end()) : mxnet::TShape(1, 1);
}
```
Can you help check whether this is the actual bug and how can we fix it?
Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]