salmanmashayekh opened a new issue #16230: Loading Sagemaker NTM Artifacts
URL: https://github.com/apache/incubator-mxnet/issues/16230
 
 
   I have trained a Neural Topic Model with Sagemaker and now I am trying to 
load/deploy the model locally. The artifacts include a `symbol` and an a 
`parameters` file. 
   
   I am using the following to load the model:
   ```
   sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, iteration)
   module_model = mx.mod.Module(symbol=sym, label_names=None, context=mx.cpu())
   ```
   
   But when I try to `bind` the model:
   ```
   module_model.bind(
       for_training = False,
       data_shapes = [('data', (1, VOCAB_SIZE))]
   )
   ```
   
   It fails with the following error:
   ```
   ---------------------------------------------------------------------------
   MXNetError                                Traceback (most recent call last)
   ~/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/symbol/symbol.py 
in simple_bind(self, ctx, grad_req, type_dict, stype_dict, group2ctx, 
shared_arg_names, shared_exec, shared_buffer, **kwargs)
      1622                                                    
shared_exec_handle,
   -> 1623                                                    
ctypes.byref(exe_handle)))
      1624         except MXNetError as e:
   
   ~/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/base.py in 
check_call(ret)
       252     if ret != 0:
   --> 253         raise MXNetError(py_str(_LIB.MXGetLastError()))
       254 
   
   MXNetError: Error in operator sample_normal0: vector::_M_range_insert
   
   During handling of the above exception, another exception occurred:
   
   RuntimeError                              Traceback (most recent call last)
   <ipython-input-349-822926d8cf09> in <module>()
         8     for_training = True,
         9     data_shapes = [('data', (1,VOCAB_SIZE))],
   ---> 10     force_rebind = True,
        11 )
        12 
   
   ~/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/module/module.py 
in bind(self, data_shapes, label_shapes, for_training, inputs_need_grad, 
force_rebind, shared_module, grad_req)
       427                                                      
fixed_param_names=self._fixed_param_names,
       428                                                      
grad_req=grad_req, group2ctxs=self._group2ctxs,
   --> 429                                                      
state_names=self._state_names)
       430         self._total_exec_bytes = self._exec_group._total_exec_bytes
       431         if shared_module is not None:
   
   
~/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/module/executor_group.py
 in __init__(self, symbol, contexts, workload, data_shapes, label_shapes, 
param_names, for_training, inputs_need_grad, shared_group, logger, 
fixed_param_names, grad_req, state_names, group2ctxs)
       277         self.num_outputs = len(self.symbol.list_outputs())
       278 
   --> 279         self.bind_exec(data_shapes, label_shapes, shared_group)
       280 
       281     def decide_slices(self, data_shapes):
   
   
~/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/module/executor_group.py
 in bind_exec(self, data_shapes, label_shapes, shared_group, reshape)
       373             else:
       374                 self.execs.append(self._bind_ith_exec(i, 
data_shapes_i, label_shapes_i,
   --> 375                                                       shared_group))
       376 
       377         self.data_shapes = data_shapes
   
   
~/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/module/executor_group.py
 in _bind_ith_exec(self, i, data_shapes, label_shapes, shared_group)
       660                                            type_dict=input_types, 
shared_arg_names=self.param_names,
       661                                            shared_exec=shared_exec, 
group2ctx=group2ctx,
   --> 662                                            
shared_buffer=shared_data_arrays, **input_shapes)
       663         self._total_exec_bytes += 
int(executor.debug_str().split('\n')[-3].split()[1])
       664         return executor
   
   ~/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/symbol/symbol.py 
in simple_bind(self, ctx, grad_req, type_dict, stype_dict, group2ctx, 
shared_arg_names, shared_exec, shared_buffer, **kwargs)
      1627                 error_msg += "%s: %s\n" % (k, v)
      1628             error_msg += "%s" % e
   -> 1629             raise RuntimeError(error_msg)
      1630 
      1631         # update shared_buffer
   
   RuntimeError: simple_bind error. Arguments:
   data: (1, 52908)
   Error in operator sample_normal0: vector::_M_range_insert
   ```
   
   From the model architecture (https://arxiv.org/pdf/1809.02687.pdf), I know 
that the input data shape is a vector with `VOCAB_SIZE` length. 
   
   Any ideas what I am doing wrong?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to