RuRo edited a comment on issue #14373: Passing parameters to HybridBlocks and 
not using them
URL: 
https://github.com/apache/incubator-mxnet/issues/14373#issuecomment-580229041
 
 
   There is a similar problem when there are unused parameters.
   For example, you can have a model like this:
   ```python
   class Test(mx.gluon.nn.HybridBlock): 
       def __init__(self, mode, *args, **kwargs): 
           super().__init__(*args, **kwargs) 
           self.mode = mode 
           with self.name_scope(): 
               self.d1 = mx.gluon.nn.Dense(2) 
               self.d2 = mx.gluon.nn.Dense(3) 
        
       def hybrid_forward(self, F, x, *args, **kwargs): 
           o1 = self.d1(x) 
           o2 = self.d2(x) 
           if self.mode: 
               return o1 # output path o2 is not used
           else: 
               return o1, o2 
   ```
   
    Currently, this model will not hybridize successfully, when `mode == True`, 
because the weights in the `o2` path are "unused".
   
   <details>
   
   ```python
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py:694: UserWarning: 
Parameter test4_dense1_weight, test4_dense1_bias is not used by any 
computation. Is this intended?
     out = self.forward(*args)
   ---------------------------------------------------------------------------
   DeferredInitializationError               Traceback (most recent call last)
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in 
_call_cached_op(self, *args)
      1012         try:
   -> 1013             cargs = [args_without_none[i] if is_arg else i.data()
      1014                      for is_arg, i in self._cached_op_args]
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
      1012         try:
   -> 1013             cargs = [args_without_none[i] if is_arg else i.data()
      1014                      for is_arg, i in self._cached_op_args]
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
       564                                "instead." % (self.name, str(ctx), 
self._stype))
   --> 565         return self._check_and_get(self._data, ctx)
       566 
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in 
_check_and_get(self, arr_list, ctx)
       230         if self._deferred_init:
   --> 231             raise DeferredInitializationError(
       232                 "Parameter '%s' has not been initialized yet because 
initialization was " \
   
   DeferredInitializationError: Parameter 'test4_dense0_weight' has not been 
initialized yet because initialization was deferred. Actual initialization 
happens during the first forward pass. Please pass one batch of data through 
the network before accessing Parameters. You can also avoid deferred 
initialization by specifying in_units, num_features, etc., for network layers.
   
   During handling of the above exception, another exception occurred:
   
   KeyError                                  Traceback (most recent call last)
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in 
_deferred_infer_shape(self, *args)
       973         try:
   --> 974             self.infer_shape(*args)
       975         except Exception as e:
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in infer_shape(self, 
*args)
      1074         """Infers shape of Parameters from inputs."""
   -> 1075         self._infer_attrs('infer_shape', 'shape', *args)
      1076 
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _infer_attrs(self, 
infer_fn, attr, *args)
      1070         for i in self.collect_params().values():
   -> 1071             setattr(i, attr, sdict[i.name])
      1072 
   
   KeyError: 'test4_dense1_weight'
   
   During handling of the above exception, another exception occurred:
   
   ValueError                                Traceback (most recent call last)
   <ipython-input-48-a18f0aa96b25> in <module>
   ----> 1 t(mx.nd.array([10]))
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in __call__(self, 
*args)
       692             hook(self, args)
       693 
   --> 694         out = self.forward(*args)
       695 
       696         for hook in self._forward_hooks.values():
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in forward(self, x, 
*args)
      1150                                      'Find all contexts = 
{}'.format(ctx_set))
      1151                 with ctx:
   -> 1152                     return self._call_cached_op(x, *args)
      1153             with ctx:
      1154                 try:
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in 
_call_cached_op(self, *args)
      1014                      for is_arg, i in self._cached_op_args]
      1015         except DeferredInitializationError:
   -> 1016             self._deferred_infer_shape(*args)
      1017             cargs = []
      1018             for is_arg, i in self._cached_op_args:
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in 
_deferred_infer_shape(self, *args)
       976             error_msg = "Deferred initialization failed because 
shape"\
       977                         " cannot be inferred. {}".format(e)
   --> 978             raise ValueError(error_msg)
       979 
       980     def _call_cached_op(self, *args):
   
   ValueError: Deferred initialization failed because shape cannot be inferred. 
'test4_dense1_weight'
   ```
   
   </details>
   
   Having unused parameters is useful since you might want your 
pretrain/finetune/evaluation networks to behave differently, but be compatible 
for `.save_parameters` and `.load_parameters` without `allow_missing` and 
`ignore_extra`.
   
   I think this issue could be fixed without changing the inner workings too 
much by adding a `F.nodiscard(o2)` operator. It would be a no-op in `nd` mode 
and would somehow mark the output as a required computation during `sym` mode. 
Not sure, how feasible something like that is.
   
   My current workaround is something like
   ```python
           return F.broadcast_add(o1, F.sum(0.0 * o2)) # output path o2 is not 
used
   ```
   which is both really ugly and potentially inefficient, since it forces the 
unneeded computation.
   
   If the `F.nodiscard` option is too hard to implement, something like
   ```python
   o1 = F.depends_on(o1, o2)
   ```
   could also work. It would basically be the same as `F.broadcast_add(o1, 
F.sum(0.0 * o2))` but without any computations.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to