taliesinb opened a new pull request #8949: New layer: split_like.
URL: https://github.com/apache/incubator-mxnet/pull/8949
 
 
   ## Description ##
   
   This PR introduces `split_like` layer, which is similar to `reshape_like` 
layer. The layer takes the first dim of the input shape (lhs), and splits it 
into two dims, which it gets from the reference shape (rhs).
   
   This layer implements a simple operation, but this operation has a big 
payoff for high-level frameworks that want to do fast bucketing. If you want to 
do fast bucketing you have to avoid unrolling / compiling the network more than 
once. For GPU we have layers like `RNN` layer which make this possible. But an 
additional requirement is that the sequence length *never* appear in the 
`shape` parameters of any `Reshape` ops, otherwise we are forced to make new 
symbols instead of simply reshaping old executors.
   
   Currently, there is a common idiom which unfortunately requires we hardcode 
the sequence length into a `Reshape`. My `split_like` allows us to implement 
this idiom without hardcoding sequence length. The idiom I mentioned is a way 
to efficiently map an operation over a sequence: 
   
   1) take a tensor, with shape `in1 = (seq, batch, d1, d2, ...)`
   2) merge the first two dims with `Reshape` using `shape=(-3,-2)` to give 
`in2 = (seq * batch, d1, d2, ...)`
   3) perform some ordinary op or ops that naturally maps over the batch 
dimension to give `out1 = (seq * batch, e1, e2, ..)`
   4) split the first dimension of the output back into two to give `out2 = 
(seq, batch, e1, e2, ...)`
   
   `seq` doesn't have to come before `batch`, it could be the other way, but 
the point is the same: in step 4 we need to reverse the split operation that we 
performed in step 2. Unfortunately, we cannot hard-code either the batch size 
or the max sequence length into a `Reshape` layer, because if we do, we prevent 
an existing executor from being resized when we wish change that parameter. But 
there is no simple way to go from `seq * batch` to `seq, batch` without knowing 
at least one of them and hardcoding it into the `Reshape`, using current MXNet 
ops.
   
   split_op makes this easy: you can just use the original tensor `in1` as the 
reference tensor, giving us `out2 = split_layer(lhs=out1, rhs=in1)`.
   
   When you have a large number of buckets, the performance savings from doing 
this can be enormous. It is possible to cache the original JSON and attempt to 
build a template of it such that you can produce new JSON more cheaply, but 
that is a complicated and hacky prospect compared to having the executor be 
naturally resizable, which `split_layer` allows.
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   I get `ModuleNotFoundError: No module named 'cpplint'`, but the code is 
mostly copy-pasted from `reshape_like`, so it is probably ok?
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   `split_like` is almost a direct copy of `reshape_like`, which seems to have 
no direct test coverage. What should I do? 
   - [x] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [x] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   (Note sure how to fill in the above section)
   
   ## Comments ##
   This seems like the only simple way of achieving executor resizability for 
the idiom I mentioned, but maybe I'm missing something and it is already 
possible!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to