sandeep-krishnamurthy opened a new issue #14858: add_n operator with MXNet-MKL 
producing wrong results when input count >4
URL: https://github.com/apache/incubator-mxnet/issues/14858
 
 
   **Problem:**
   
   With mxnet-mkl (1.4.0) 
   If number of input symbols > 4 and I perform add_n after a FC layer produces 
wrong results.
   i.e.,
   ```
   data_0 -> fc_0  \
   data_1 -> fc_1   \ 
   data_2 -> fc_2      => add_n
   data_3 -> fc_3  /
   data_4 -> fc_4 /
   ```
   Minimum reproducible code below:
   
   Run below code which is full network:
   
   ```python
   import mxnet as mx
   
   num_inp_symbols = 5
   data_shape = (5,5)
   hidden_layer_size = 8
   
   input_symbols = [mx.sym.var('data_'+str(i)) for i in range(num_inp_symbols)]
   fully_connected_symbols = [mx.sym.FullyConnected(data=input_symbols[i], 
                                                                                
             num_hidden=hidden_layer_size, 
                                                                                
             name='fc_'+str(i))
                                                   for i in range(num_datasets)]
   
   #Create final symbol
   net = mx.sym.add_n(*fully_connected_symbols)
   #Validate topology
   #mx.viz.plot_network(net)
   
   mod = mx.mod.Module(symbol=net, data_names=['data_0', 'data_1', 'data_2', 
'data_3', 'data_4'], label_names=None)
   mod.bind(for_training=False, data_shapes=[('data_0', data_shape), ('data_1', 
data_shape), ('data_2', data_shape), ('data_3', data_shape), ('data_4', 
data_shape)])
   mod.set_params(full_module.get_params()[0], full_module.get_params()[1])
   
   mod.forward(mx.io.DataBatch([mx.nd.ones(data_shape), mx.nd.ones(data_shape), 
mx.nd.ones(data_shape), mx.nd.ones(data_shape), mx.nd.ones(data_shape)]))
   print(mod.get_outputs()[0])
   ```
   Output
   ```
   [[ 2.2989948  -3.3271918   0.64880913  2.2778904   0.9859241   2.0046096
     -1.6065626   1.5986269 ]
    [ 2.2989948  -3.3271918   0.64880913  2.2778904   0.9859241   2.0046096
     -1.6065626   1.5986269 ]
    [ 2.2989948  -3.3271918   0.64880913  2.2778904   0.9859241   2.0046096
     -1.6065626   1.5986269 ]
    [ 2.2989948  -3.3271918   0.64880913  2.2778904   0.9859241   2.0046096
     -1.6065626   1.5986269 ]
    [ 2.2989948  -3.3271918   0.64880913  2.2778904   0.9859241   2.0046096
     -1.6065626   1.5986269 ]]
   <NDArray 5x8 @cpu(0)>
   ```
   
   However, Let us now compute output of each FC in above network (fc0_output, 
fc1_output,... fc4_output). What I observe is the if I do individual fc output 
calculation and sum it up it is not same result as running everything together.
   
   ```
   constituent_fc0 = fully_connected_symbols[0]
   print(constituent_fc0.get_internals().list_outputs())
   
   mod_cons_fc0 = mx.mod.Module(symbol=constituent_fc0, data_names=['data_0'], 
label_names=None)
   mod_cons_fc0.bind(for_training=False, data_shapes=[('data_0', data_shape)])
   mod_cons_fc0.set_params(mod.get_params()[0], mod.get_params()[1])
   mod_cons_fc0.forward(mx.io.DataBatch([mx.nd.ones(data_shape)]))
   o1 = mod_cons_fc0.get_outputs()[0]
   
   #and so on for fc1, fc2, fc3, fc4
   #and then do
   print(nd.add_n(o1, o2, o3, o4, o5))
   ```
   
   @ZhennanQin @pengzhao-intel - Can you please help debug this issue?
   Please Note:
   1. storage type is all dense
   2. Number of inputs > 4
   3. Happens only from Module APIs and from mxnet-mkl 1.3.0 version onwards.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to