FCInter opened a new issue #8023: Error in continuing training a model loaded 
from file
URL: https://github.com/apache/incubator-mxnet/issues/8023
 
 
   I'm trying to load a model previously saved in files following the tutorial 
[here](https://mxnet.incubator.apache.org/tutorials/python/predict_image.html). 
I use exactly the same command as shown in the tutorial, but I meet with the 
following error message:
   
   ## Error Message:
   ```
   Traceback (most recent call last):
     File "test.py", line 153, in <module>
       num_epoch=num_epoch)
     File 
"/home/mypath/software/try_mxnet2/mxnet/python/mxnet/module/base_module.py", 
line 496, in fit
       self.update_metric(eval_metric, data_batch.label)
     File 
"/home/mypath/software/try_mxnet2/mxnet/python/mxnet/module/module.py", line 
735, in update_metric
       self._exec_group.update_metric(eval_metric, labels)
     File 
"/home/mypath/software/try_mxnet2/mxnet/python/mxnet/module/executor_group.py", 
line 567, in update_metric
       for label, axis in zip(labels, self.label_layouts):
   TypeError: zip argument #2 must support iteration
   ```
   
   The code of loading and retraining the files is as follows:
   
   ## Minimum reproducible example
   ```
   sym, arg_params, aux_params = 
mx.model.load_checkpoint('../model/test_mymodel', 25)
   lenet_model = mx.mod.Module(symbol=sym, context=mx.gpu(), label_names=None)
   
   lenet_model.bind(for_training=True, data_shapes=[('data', 
(batch_size,3,16,16))], 
            label_shapes=lenet_model._label_shapes)
   lenet_model.set_params(arg_params, aux_params, allow_missing=True)
   lenet_model.fit(train_iter,
                                optimizer='adam',
                                
optimizer_params={'learning_rate':0.001,'wd':0.0005},
                                eval_metric='acc',
                                batch_end_callback = 
mx.callback.Speedometer(batch_size, n_report), 
                                epoch_end_callback  = 
mx.callback.do_checkpoint("../model/test_mymodel", 5),
                                num_epoch=num_epoch)
   ```
   
   As I have tested, when I comment out the line `lenet_model.fit(...)`, no 
error is reported. It seems the loaded model cannot be trained continuously, or 
there is something wrong with my code.
   
   I'm looking forward to kind solutions. Thanks!
   
   The basic information of my system is as follows:
   ## Environment info
   Operating System:
   CentOS 6.6
   
   Compiler:
   gcc 5.4.0
   
   Package used (Python/R/Scala/Julia):
   Python
   
   MXNet version:
   0.11.0
   
   Python version and distribution:
   Anaconda 2.7.13
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to