sxjscience opened a new issue #18116:
URL: https://github.com/apache/incubator-mxnet/issues/18116


   I find that the load/save logic in Gluon does not respect the `prefix` in 
the network.
   
   Consider the following example, I created two networks, `Foo` and `Foo2`, 
where they both have one dense layer with `prefix='layer_'` but with different 
attribute names. One is called `self.l1` and the other is called `self.l2`. At 
first glance, because these two layers **share the same prefix**, we can share 
the parameters, i.e., directly load the parameters from `foo` to `foo2`.
   
   However, the following code will trigger an error:
   
   ```python
   import mxnet as mx
   from mxnet.gluon import HybridBlock, nn
   import tempfile
   import os
   mx.npx.set_np()
   
   
   class Foo(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l1 = nn.Dense(16, prefix='layer_')
   
       def hybrid_forward(self, F, x):
           return self.l1(x)
   
   
   class Foo2(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l2 = nn.Dense(16, prefix='net_')
   
       def hybrid_forward(self, F, x):
           return self.l2(x)
   
   
   foo = Foo()
   foo.initialize()
   foo(mx.np.ones((32, 6)))
   foo2 = Foo2()
   with tempfile.TemporaryDirectory() as dir_path:
       foo.save_parameters(os.path.join(dir_path, 'test.params'))
       foo2.load_parameters(os.path.join(dir_path, 'test.params'))
   ```
   
   Error message:
   ```
   AssertionError: Parameter 'l2.weight' is missing in file 
'/tmp/tmpkf_n3w3s/test.params', which contains parameters: 'l1.weight', 
'l1.bias'. Set allow_missing=True to ignore missing parameters.
   ```
   
   Thus, Gluon is using the attribute name for sharing the parameters. 
   
   To understand the problem, let's consider the following example, in which we 
create network that has 4 shared dense layers. When we call `save_parameters`, 
the saved parameters should ideally only contain a single copy of the weights. 
However, it know contains 4 copies of the weights. This is **not acceptable in 
the deployment setting** in which we will have hard constraint on the size of 
the artifact.
   
   ```python
   import mxnet as mx
   from mxnet.gluon import HybridBlock, nn
   import tempfile
   import os
   mx.npx.set_np()
   
   
   class Foo(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l1 = nn.Dense(2048, prefix='layer_')
               self.l2 = nn.Dense(2048, params=self.l1.collect_params())
               self.l3 = nn.Dense(2048, params=self.l1.collect_params())
               self.l4 = nn.Dense(2048, params=self.l1.collect_params())
   
       def hybrid_forward(self, F, x):
           return self.l4(self.l3(self.l2(self.l1(x))))
   
   
   class Foo2(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l1 = nn.Dense(2048, prefix='layer_')
   
       def hybrid_forward(self, F, x):
           return self.l1(x)
   
   
   foo = Foo()
   foo.initialize()
   foo(mx.np.ones((32, 2048)))
   foo2 = Foo2(params=foo.collect_params())
   
   with tempfile.TemporaryDirectory() as dir_path:
       foo.save_parameters(os.path.join(dir_path, 
'foo1_save_parameters.params'))
       foo2.save_parameters(os.path.join(dir_path, 
'foo2_save_parameters.params'))
       print('Keys by collect_params():', foo.collect_params().keys())
       print('Keys by loading the shared parameters:', 
mx.npx.load(os.path.join(dir_path, 'foo1_save_parameters.params')).keys())    
       print('Four shared layer artifact size:', os.stat(os.path.join(dir_path, 
'foo1_save_parameters.params')).st_size)
       print('One layer artifact size:', os.stat(os.path.join(dir_path, 
'foo2_save_parameters.params')).st_size)
   
   ```
   
   Output as follows. We can see that the size of `foo.save_parameters()` will 
be 4 times the size of `foo2.save_parameters()`. However, these two should be 
the same.
   ```
   Keys by collect_params(): odict_keys(['foo3_layer_weight', 
'foo3_layer_bias'])
   Keys by loading the shared parameters: dict_keys(['l1.weight', 'l1.bias', 
'l2.weight', 'l2.bias', 'l3.weight', 'l3.bias', 'l4.weight', 'l4.bias'])
   Four shared layer artifact size: 67142080
   One layer artifact size: 16785544
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to