leezu commented on issue #16376:  [RFC] Deferred compute in imperative 
interface to unify imperative and symbolic interface
URL: 
https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-579529593
 
 
   > This seems to be a big change to the existing operator mode (imperative 
and symbolic).
   
   Essentially the motivation for deferred compute is to extend imperative mode 
to enable users to "construct a symbol" without using symbolic API. This 
addresses confusion around having two APIs and prevents divergence between 
imperative and symbolic APIs. There's no need to drop the existing imperative / 
symbolic APIs due to deferred compute.
   
   > Could you please provide more information.
   
   Please ask a question and I'll answer ;)
   
   > AFAIK, symbolic API already does deferred init, imperative API is provided 
to improve user experience. Based on this RFC, what's the advantage of this new 
deferred_compute mode? As a user, when should I use it or not.
   
   Based on deferred compute we can simplify `gluon.HybridBlock` API so that it 
matches the `gluon.Block` API. For example, consider you'd like to reimplement 
`Dense(HybridBlock)` based on extended `HybridBlock` API based on deferred 
compute:
   
   ``` python
   class Dense(HybridBlock):
       def __init__(self, units, use_bias=True, flatten=True,
                    dtype='float32', weight_initializer=None, 
bias_initializer='zeros',
                    in_units=0): 
           super().__init__()
           self._flatten = flatten
           self._units = units
           self.weight = gluon.Parameter(shape=(units, in_units),
                                         init=weight_initializer, dtype=dtype,
                                         allow_deferred_init=True)
           if use_bias:
               self.bias = gluon.Parameter(shape=(units,),
                                           init=bias_initializer, dtype=dtype,
                                           allow_deferred_init=True)
           else:
               self.bias = None
   
       def forward(self, x):  # We allow users to overwrite forward() directly. 
   
           ctx = x.context
           return npx.FullyConnected(x, self.weight.data(ctx), 
self.bias.data(ctx),
                 no_bias=bias is None, num_hidden=self._units,
                 flatten=self._flatten, name='fwd')
   ```
   
   `HybridBlock` can wrap the execution of `forward` into a deferred compute 
session and obtain a symbolic representation of the computation and pass it to 
`CachedOp`.
   
   There would be no reason for users to explicitly use the API.
   
   > Another question. We all know deferred init cause bad user experience when 
it comes to debugging. Would this RFC address the debuggability issue?
   
   This RFC is orthogonal to deferred init. When updating `gluon.HybridBlock` 
API based on deferred compute, one option is to require statically known shapes 
of weights at construction time **if** users implement `def forward`. For 
backwards compatibility we likely want to keep deferred init around for 
existing code relying on `mx.sym` and implementing `def hybrid_forward`.
   
   However, the other option is to allow deferred initialization of weights and 
require users to implement `infer_shape`:
   
   
https://github.com/apache/incubator-mxnet/blob/910c608f682a47fc2c43375b5f5a426b563e5821/python/mxnet/gluon/block.py#L1073-L1075
   
   This works around the failures of symbolic shape inference for deferred init 
in case of dynamic shape ops, while still allowing users to decide the shape of 
weight at first forward.
   
   In the example above, it could look like:
   
   ``` python
   class Dense(HybridBlock):
       def __init__(self, units, use_bias=True, flatten=True,
                    dtype='float32', weight_initializer=None, 
bias_initializer='zeros',
                    in_units=0): 
           [...]
   
       def infer_shape(self, x):
           self.weight.shape = (self.weight.shape[0], x.shape[1])
   
       def forward(self, x):
           [...]
   ```
   
   > If it's about performance optimization, could we have some initial data of 
using this new deferred mode vs. existing imperative mode?
   
   There is the option to improve performance of imperative mode by deferring 
the computation and optimizing the computational graph before performing the 
computation. But this is not the main motivation and I haven't optimized for 
this use-case (yet). In the `gluon.HybridBlock` case, we only run with deferred 
compute once to construct the symbolic graph and then pass over to `CachedOp` 
for optimized execution.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to