sandeep-krishnamurthy opened a new issue #14858: add_n operator with MXNet-MKL producing wrong results when input count >4 URL: https://github.com/apache/incubator-mxnet/issues/14858 **Problem:** With mxnet-mkl (1.4.0) If number of input symbols > 4 and I perform add_n after a FC layer produces wrong results. i.e., ``` data_0 -> fc_0 \ data_1 -> fc_1 \ data_2 -> fc_2 => add_n data_3 -> fc_3 / data_4 -> fc_4 / ``` Minimum reproducible code below: Run below code which is full network: ```python import mxnet as mx num_inp_symbols = 5 data_shape = (5,5) hidden_layer_size = 8 input_symbols = [mx.sym.var('data_'+str(i)) for i in range(num_inp_symbols)] fully_connected_symbols = [mx.sym.FullyConnected(data=input_symbols[i], num_hidden=hidden_layer_size, name='fc_'+str(i)) for i in range(num_datasets)] #Create final symbol net = mx.sym.add_n(*fully_connected_symbols) #Validate topology #mx.viz.plot_network(net) mod = mx.mod.Module(symbol=net, data_names=['data_0', 'data_1', 'data_2', 'data_3', 'data_4'], label_names=None) mod.bind(for_training=False, data_shapes=[('data_0', data_shape), ('data_1', data_shape), ('data_2', data_shape), ('data_3', data_shape), ('data_4', data_shape)]) mod.set_params(full_module.get_params()[0], full_module.get_params()[1]) mod.forward(mx.io.DataBatch([mx.nd.ones(data_shape), mx.nd.ones(data_shape), mx.nd.ones(data_shape), mx.nd.ones(data_shape), mx.nd.ones(data_shape)])) print(mod.get_outputs()[0]) ``` Output ``` [[ 2.2989948 -3.3271918 0.64880913 2.2778904 0.9859241 2.0046096 -1.6065626 1.5986269 ] [ 2.2989948 -3.3271918 0.64880913 2.2778904 0.9859241 2.0046096 -1.6065626 1.5986269 ] [ 2.2989948 -3.3271918 0.64880913 2.2778904 0.9859241 2.0046096 -1.6065626 1.5986269 ] [ 2.2989948 -3.3271918 0.64880913 2.2778904 0.9859241 2.0046096 -1.6065626 1.5986269 ] [ 2.2989948 -3.3271918 0.64880913 2.2778904 0.9859241 2.0046096 -1.6065626 1.5986269 ]] <NDArray 5x8 @cpu(0)> ``` However, Let us now compute output of each FC in above network (fc0_output, fc1_output,... fc4_output). What I observe is the if I do individual fc output calculation and sum it up it is not same result as running everything together. ``` constituent_fc0 = fully_connected_symbols[0] print(constituent_fc0.get_internals().list_outputs()) mod_cons_fc0 = mx.mod.Module(symbol=constituent_fc0, data_names=['data_0'], label_names=None) mod_cons_fc0.bind(for_training=False, data_shapes=[('data_0', data_shape)]) mod_cons_fc0.set_params(mod.get_params()[0], mod.get_params()[1]) mod_cons_fc0.forward(mx.io.DataBatch([mx.nd.ones(data_shape)])) o1 = mod_cons_fc0.get_outputs()[0] #and so on for fc1, fc2, fc3, fc4 #and then do print(nd.add_n(o1, o2, o3, o4, o5)) ``` @ZhennanQin @pengzhao-intel - Can you please help debug this issue? Please Note: 1. storage type is all dense 2. Number of inputs > 4 3. Happens only from Module APIs and from mxnet-mkl 1.3.0 version onwards.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
