Mu,

You’re mentioning plans for a new model format and compiler, but I don’t recall 
seeing it shared/discussed on the dev list. Can you share these, so it is more 
accessible to folks to understand the plan and vision?
 
Personally, I think it will be a shame to add ONNX support to MXNet, and have 
it implemented outside of MXNet. At the end of the day, it makes things 
difficult for MXNet users.

Hagay

On 10/19/17, 10:01, "Mu Li" <[email protected] on behalf of [email protected]> 
wrote:

    I'm speaking under my "MXNet contributor" hat.
    
    It will be sad that our new model format and compiler is not supported by
    our own contributors. It puts us in a bad position to reach out to outside
    to ask for support.
    
    If you really what to do it with the onnx <-> mxnet way, I suggest putting
    the codes under https://github.com/aws.
    
    Best
    Mu
    
    On Thu, Oct 19, 2017 at 9:51 AM, Lupesko, Hagay <[email protected]> wrote:
    
    > Since there seems to be a difficulty to reach a consensus here, and this
    > is a new area, maybe a good compromise would be to contribute this under
    > /contrib as experimental, with whatever way Roshani thinks makes sense.
    > Once there is code in place, and MXNet users and contributors are able to
    > check it out, we can consider future steps.
    >
    > Does this proposal make sense to folks?
    >
    > On 10/18/17, 23:01, "Tianqi Chen" <[email protected] on behalf of
    > [email protected]> wrote:
    >
    >     I want to offer one last thing in terms of technical details. I
    > mentioned
    >     two trends in the deep learning systems. There is one last thing that
    > is
    >     omitted. How should we build a good deploy end for deep learning
    > models.
    >
    >     There is always a paradox to this problem:
    >
    >     - On one hand, the deployment end needs to be lightweight and 
portable.
    >     - We want a lot of optimizations (memory layout compute) and feature
    >     support, this makes the project big.
    >
    >     All the existing systems suffer from this problem. The solution is
    > simple,
    >     separates the optimization part from the actual runtime and compiles
    > the
    >     things down to a bare metal module. And this is the solution nnvm/top
    >     compiler pipeline offer, which I believe will become a standard
    > practice of
    >     deployment and where all systems go to
    >
    >     Tianqi
    >
    >     On Wed, Oct 18, 2017 at 10:03 PM, Tianqi Chen <
    > [email protected]>
    >     wrote:
    >
    >     > OK, there is some miscommunication in here I guess.  We only need to
    > do a
    >     > "canonization" step in python API that goes a symbol to symbol
    > translation
    >     > layer. It can be done in purely in python, and there is no need for
    > going
    >     > "down" into c++ to do this.
    >     >
    >     > For example, the current nnvm.from_mxnet API takes Module or Gluon
    > module
    >     > and get you back nnvm/top graph in python.
    >     >
    >     > All we are asking for is to decomposing it into
    >     >
    >     > def mxnet_to_onnx(module):
    >     >    nnvm_graph, params = nnvm_from_mxnet(module)
    >     >    onnx = nnvm_to_onnx(nnvm_graph, params)
    >     >    return onnx
    >     >
    >     > This allows nnvm_from_mxnet to be reused for other purposes, like
    >     > compiling API to deployable modules
    >     >
    >     > Tianqi
    >     >
    >     > On Wed, Oct 18, 2017 at 9:55 PM, Lupesko, Hagay <[email protected]>
    > wrote:
    >     >
    >     >> Tianqi:
    >     >> Thanks for detailing the trends. I fully agree that ONNX is just a
    > graph
    >     >> serialization format – nothing more, nothing less. I also think we
    > all
    >     >> agree that this simple mechanism holds lots of value to DL users
    > since it
    >     >> allows them to move between frameworks easily (e.g. train with
    > MXNet,
    >     >> deploy on a mobile device with Caffe2, or the other way around).
    >     >> As you said, In Memory IR is different than serialization formats
    > such as
    >     >> ONNX. They are designed to make the runtime execution as efficient
    > as
    >     >> possible, leveraging software and hardware optimizations. They are
    > indeed
    >     >> complex, and where the “meat” is.
    >     >> (BTW ONNX regards itself as an “IR” format, but not in the same
    > sense as
    >     >> NNVM).
    >     >>
    >     >> At the end of the day, Roshani is aiming to deliver a simple
    >     >> functionality to MXNet users: (1) take an ONNX file, and load it
    > into MXNet
    >     >> so you get a graph+weights you can work with (2) Given a trained
    > model,
    >     >> save it as an ONNX file. Since MXNet users do not interact with 
NNVM
    >     >> directly, but rather interact with MXNet API (MXNet Module), isn’t
    > the
    >     >> simplest thing to do is just to construct the Module “on the fly”
    > using
    >     >> MXNet API? Taking the other approach, we will go from the top level
    > MXNet
    >     >> “load” API, go “down” to NNVM to construct the graph, go back up to
    > MXNet
    >     >> to expose it as a Module. This seems to complex and does not add 
any
    >     >> benefit. In whatever way we construct the MXNet Module object, NNVM
    > will
    >     >> always be the underlying in memory IR that is being executed, so
    > why not
    >     >> take the simpler route?
    >     >>
    >     >> Hagay
    >     >>
    >     >> On 10/18/17, 19:42, "Tianqi Chen" <[email protected] on behalf of
    >     >> [email protected]> wrote:
    >     >>
    >     >>     Hi Chris:
    >     >>
    >     >>     There is no intention to move things away from mxnet. The
    > reduction of
    >     >>     lines of code by having a better design in general, and
    > usually, you
    >     >> write
    >     >>     less redundant code by benefiting from better design. As I may
    > quote:
    >     >> "the
    >     >>     best design is not achieved not when there is nothing to add,
    > but when
    >     >>     there is nothing to be taken away."
    >     >>
    >     >>     MXNet has always benefited from this philosophy and improves
    > with the
    >     >> new
    >     >>     designs and proper modularization. For example, we see such
    > reduction
    >     >> and
    >     >>     convenience happening when migrating from MXNet's legacy op to
    > the
    >     >>     NNVM's mechanism. The new mechanism now enables things like
    > sparse
    >     >> aware
    >     >>     support and other stuff which would be much harder to support.
    >     >>
    >     >>     The nnvm/tvm stack comes brings the same benefit(if not more)
    > and it
    >     >> will
    >     >>     only add more features to MXNet itself. Offering more hardware
    >     >> backends and
    >     >>     optimization, allowing us to write less code and spent less
    > time to
    >     >>     optimize for each backend by going through TVM
    >     >>
    >     >>     Tianqi
    >     >>
    >     >>     On Wed, Oct 18, 2017 at 7:15 PM, Chris Olivier <
    > [email protected]
    >     >> >
    >     >>     wrote:
    >     >>
    >     >>     > Reduce code base of mxnet? By increasing scope of the dmlc
    > modules?
    >     >> Is the
    >     >>     > intent to make mxnet a thin language wrapper around a group
    > of dmlc
    >     >>     > modules?
    >     >>     >
    >     >>     >
    >     >>     > On Wed, Oct 18, 2017 at 6:58 PM Tianqi Chen <
    >     >> [email protected]>
    >     >>     > wrote:
    >     >>     >
    >     >>     > > To better answer Hagay's question, I would like to dive
    > down a
    >     >> bit deeper
    >     >>     > > on the relation between MXNet, NNVM and model exchange
    > format
    >     >> like ONNX.
    >     >>     > >
    >     >>     > > There are two major trends in deep learning systems now:
    >     >>     > >
    >     >>     > > - Common serializable formats, like ONNX and CoreML, that
    > defines
    >     >> the
    >     >>     > model
    >     >>     > > exchange format.
    >     >>     > > - The in-memory graph IR for quick optimization and JIT.
    > NNVM,
    >     >>     > Tensorflow's
    >     >>     > > XLA falls into this category.
    >     >>     > >
    >     >>     > > The exchange formats are great, it only poses a layer of
    >     >> conversion,
    >     >>     > which
    >     >>     > > is good for exchange. The real meat still comes from the
    >     >> compilation and
    >     >>     > > JIT pipeline you have to offer. For that, we will need an
    >     >> in-memory IR,
    >     >>     > > because of the cost of constructing, serialize could be
    > high for
    >     >> the
    >     >>     > > exchange formats like protobuf.  And usually, the exchange
    >     >> formats are
    >     >>     > > designed in a minimalistic fashion, making it less easy to
    > extend
    >     >> more
    >     >>     > > information to support in-depth optimization like automatic
    >     >> quantization,
    >     >>     > > accelerator support.
    >     >>     > >
    >     >>     > > The current MXNet relies on NNVM for in-memory IR
    > manipulation
    >     >> but does
    >     >>     > not
    >     >>     > > contain a compilation component that compiles to the
    > hardware
    >     >> backends.
    >     >>     > > Doing export to an exchange format and then back into NNVM
    > run the
    >     >>     > > compilation poses too much burden that JIT compiler could
    > pay.
    >     >> Using the
    >     >>     > > same in-memory graph IR as the compilation stack give much
    > more
    >     >> advantage
    >     >>     > > in terms of this.
    >     >>     > >
    >     >>     > > The newly introduces nnvm/top and compiler offers in-memory
    > graph
    >     >>     > > optimization and compilation and offers more hardware
    > backend
    >     >> directly
    >     >>     > via
    >     >>     > > TVM. We already see promising results in edge deployments
    > with a
    >     >> much
    >     >>     > lower
    >     >>     > > overhead of runtime. We will further benefit quickly from
    > more
    >     >> graph
    >     >>     > > optimizations that it has to offer.
    >     >>     > >
    >     >>     > > Building support around this new paradigm offers us
    > advantage of
    >     >> being
    >     >>     > > future compatible and takes full benefit of the points I
    >     >> mentioned above
    >     >>     > >
    >     >>     > > Tianqi
    >     >>     > >
    >     >>     > >
    >     >>     > >
    >     >>     > > On Wed, Oct 18, 2017 at 4:57 PM, Lupesko, Hagay <
    >     >> [email protected]>
    >     >>     > wrote:
    >     >>     > >
    >     >>     > > > Roshani – this is an exciting initiative, ONNX support on
    > MXNet
    >     >> will
    >     >>     > > > enable more users to ramp up on MXNet, which is great.
    >     >>     > > >
    >     >>     > > > Tianqi – a few questions and thoughts about your note:
    >     >>     > > > - “More hardware backends to mxnet” – MXNet users get the
    > same
    >     >> benefit
    >     >>     > of
    >     >>     > > > HW support implementing ONNX import on top of MXNet
    > symbolic,
    >     >> right?
    >     >>     > > > - “NNVM Compiler now received contributions from AWS, UW
    > and
    >     >> many other
    >     >>     > > > folks in MXNet community.” – agreed it is ramping up, but
    > when
    >     >> you look
    >     >>     > > at
    >     >>     > > > the data, it is clear that it is very early on for NNVM.
    >     >> Looking at the
    >     >>     > > > repo, it has overall 223 commits, 0 releases. Compare it
    > to
    >     >> MXNet with
    >     >>     > > 6136
    >     >>     > > > commits and 32 releases. It seems to be still early on 
for
    >     >> NNVM, and
    >     >>     > for
    >     >>     > > a
    >     >>     > > > more reliable initial implementation building the import
    > on top
    >     >> of
    >     >>     > MXNet
    >     >>     > > is
    >     >>     > > > easier, faster and safer. MXNet has lots of users already
    > using
    >     >> the
    >     >>     > > > Symbolic API which hopefully mean that is a mature API
    > that is
    >     >> not
    >     >>     > likely
    >     >>     > > > to have breaking changes or major issues.
    >     >>     > > >
    >     >>     > > > I’m supportive option 1 proposed by Roshani (building
    > serde on
    >     >> top of
    >     >>     > > > MXNet symbolic), but to do it as an encapsulated
    > implementation
    >     >> detail,
    >     >>     > > so
    >     >>     > > > the implementation can be migrated to NNVM or another
    >     >> implementation in
    >     >>     > > the
    >     >>     > > > future, if at that point it seems like the right thing to
    > do.
    >     >>     > > >
    >     >>     > > > Interested in hearing other opinions though…
    >     >>     > > >
    >     >>     > > > Hagay
    >     >>     > > >
    >     >>     > > > On 10/18/17, 14:13, "Tianqi Chen" <[email protected] on
    >     >> behalf of
    >     >>     > > > [email protected]> wrote:
    >     >>     > > >
    >     >>     > > >     I am strongly recommending going through the
    > nnvm/top. One
    >     >> major
    >     >>     > > > reason in
    >     >>     > > >     here is that the support of nnvm/top layer NOT ONLY
    > mean
    >     >>     > > compatibility
    >     >>     > > > of
    >     >>     > > >     model format with onnx. These are the major benefits:
    >     >>     > > >
    >     >>     > > >
    >     >>     > > >     - More hardware backends to mxnet, including opencl,
    > metal,
    >     >>     > Raspberry
    >     >>     > > > Pi,
    >     >>     > > >     web browser. These things are automatically enabled
    > by going
    >     >>     > through
    >     >>     > > > this
    >     >>     > > >     layer. In general, we design nnvm/tvm stack to
    > resolve the
    >     >>     > challenge
    >     >>     > > of
    >     >>     > > >     current mxnet's weakness in terms deploying to more
    > hardware
    >     >>     > > backends.
    >     >>     > > >
    >     >>     > > >     - More frontend capabilities, nnvm's gluon style IR
    > ingests
    >     >> now
    >     >>     > from
    >     >>     > > >     CoreML, ONNX and in future keras. Supporting those
    > will
    >     >> reduce the
    >     >>     > > > amount
    >     >>     > > >     of engineering effort needed.
    >     >>     > > >
    >     >>     > > >     - Future compatibility. We all agree that the future
    > being
    >     >> migrated
    >     >>     > > to
    >     >>     > > >     gluon's API. NNVM/top tries to look ahead by directly
    >     >> adopting the
    >     >>     > > > symbolic
    >     >>     > > >     API to be gluon.
    >     >>     > > >
    >     >>     > > >
    >     >>     > > >     I would also like to correct some of the mentioned
    > facts
    >     >> with
    >     >>     > regard
    >     >>     > > to
    >     >>     > > >     nnvm/tvm stack
    >     >>     > > >
    >     >>     > > >     1.   Nascent project with few contributors
    >     >>     > > >
    >     >>     > > >     NNVM Compiler now received contributions from AWS, UW
    > and
    >     >> many
    >     >>     > other
    >     >>     > > > folks
    >     >>     > > >     in MXNet community. NNVM itself is already being used
    > by
    >     >> MXNet.
    >     >>     > > >     MXNet's internal IR is migrating toward gluon, and 
its
    >     >> final form
    >     >>     > > being
    >     >>     > > >     nnvm/top
    >     >>     > > >
    >     >>     > > >     3.   Does not support all operators that exist in
    > MXNet
    >     >> Symbolic
    >     >>     > API
    >     >>     > > >
    >     >>     > > >     Neither NNVM/top or onnx support all operators that
    > exist
    >     >> in mxnet
    >     >>     > > > symbolic
    >     >>     > > >     API. The end goal here is mainly to make nnvm/top 
onnx
    >     >> compatible,
    >     >>     > > > which is
    >     >>     > > >     a more reasonable goal.
    >     >>     > > >
    >     >>     > > >     4.  No CI Pipeline and testcases
    >     >>     > > >
    >     >>     > > >     NNVM already contains a compiler contains unittests
    > and ci
    >     >> tested
    >     >>     > > with
    >     >>     > > >     integration  https://github.com/dmlc/nnvm, with a CI
    >     >> pipline that
    >     >>     > is
    >     >>     > > > well
    >     >>     > > >     tested on CPU and GPU cases for front-ends.
    >     >>     > > >
    >     >>     > > >     Tianqi
    >     >>     > > >
    >     >>     > > >
    >     >>     > > >     On Wed, Oct 18, 2017 at 1:41 PM, Roshani Nagmote <
    >     >>     > > > [email protected]>
    >     >>     > > >     wrote:
    >     >>     > > >
    >     >>     > > >     > Hi guys,
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > I am working on supporting ONNX <
    >     >> https://github.com/onnx/onnx>
    >     >>     > > > pre-trained
    >     >>     > > >     > models in Apache MXNet and would like to seek your
    >     >> opinion on the
    >     >>     > > > choice of
    >     >>     > > >     > implementation. I also have created a GitHub issue
    >     >>     > > >     > <https://github.com/apache/
    > incubator-mxnet/issues/8319>.
    >     >>     > > Supporting
    >     >>     > > > ONNX
    >     >>     > > >     > in
    >     >>     > > >     > MXNet will enable users to move between frameworks
    > with
    >     >> their
    >     >>     > > > models, this
    >     >>     > > >     > will also enable MXNet project to be a part of the
    > ONNX
    >     >> open
    >     >>     > > > standard and
    >     >>     > > >     > steer the direction of ONNX.
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > For those who don’t know ONNX, ONNX is an open
    > source
    >     >> format for
    >     >>     > AI
    >     >>     > > > models
    >     >>     > > >     > which enables models to be transferred between
    >     >> frameworks. Refer
    >     >>     > to
    >     >>     > > >     > https://github.com/onnx/onnx for more details.
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > To implement the import/export functionality in
    > MXNet, I
    >     >> propose
    >     >>     > to
    >     >>     > > > expose
    >     >>     > > >     > a MXNet python module “serde”(name taken from
    > Apache Hive
    >     >>     > project)
    >     >>     > > > with the
    >     >>     > > >     > following methods supporting different formats:
    >     >>     > > >     >
    >     >>     > > >     > sym, params = mxnet.serde.import(other_format_file,
    >     >>     > > > other_format=‘onnx’)
    >     >>     > > >     >
    >     >>     > > >     > other_format_file =  mxnet.serde.export(mxnet_sym,
    >     >> mxnet_params,
    >     >>     > > > ‘onnx’)
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > The implementation under the hood can be done in
    > two ways:
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > 1) Implement at the MXNet layer by parsing the ONNX
    >     >> model(in
    >     >>     > > protobuf
    >     >>     > > >     > format) and turn into MXNet Symbolic operators and
    > build
    >     >> MXNet
    >     >>     > > model
    >     >>     > > >     > directly. Similarly, I can convert the MXNet model
    > to
    >     >> ONNX format
    >     >>     > > at
    >     >>     > > > this
    >     >>     > > >     > layer.
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > 2) The DMLC community has released the nnvm/tvm
    > complier
    >     >> and an
    >     >>     > > >     > intermediate representation of the models, refer:
    >     >>     > > >     > http://www.tvmlang.org/2017/
    > 10/06/nnvm/tvm-compiler-
    >     >>     > > > announcement.html
    >     >>     > > >     > <http://www.tvmlang.org/2017/10/06/nnvm-compiler-
    >     >>     > announcement.html
    >     >>     > > >
    >     >>     > > >     >
    >     >>     > > >     > Based on the conversation on the GitHub issue
    >     >>     > > >     > <https://github.com/apache/
    > incubator-mxnet/issues/8319> I
    >     >>     > opened,
    >     >>     > > Mu
    >     >>     > > >     > mentioned that MXNet would use nnvm/tvm as the
    > backend in
    >     >> the
    >     >>     > > future.
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > We could hook into this layer to implement the
    >     >> import/export
    >     >>     > > > functionality.
    >     >>     > > >     > nnvm/tvm has ONNX 0.1 version import implemented.
    >     >>     > > >     >
    >     >>     > > >     > For import,
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    I will need to enhance nnvm/tvm’s importer to
    > support
    >     >> ONNX 0.2
    >     >>     > > >     >    2.
    >     >>     > > >     >
    >     >>     > > >     >    Implement nnvm/tvm->mxnet symbolic operators.
    >     >>     > > >     >
    >     >>     > > >     > For export:
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    mxnet->nnvm/tvm ( nnvm/tvm provides this
    > implementation
    >     >>     > already)
    >     >>     > > >     >    2.
    >     >>     > > >     >
    >     >>     > > >     >    I will need to Implement nnvm/tvm>onnx.
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > These are the pros and cons I see in the above
    > approaches:
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    Import/export at mxnet layer
    >     >>     > > >     >
    >     >>     > > >     > Pros:
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    Stable APIs currently used by users.
    >     >>     > > >     >    2.
    >     >>     > > >     >
    >     >>     > > >     >    Larger Apache MXNet community of contributors.
    >     >>     > > >     >    3.
    >     >>     > > >     >
    >     >>     > > >     >    CI pipeline to catch bugs.
    >     >>     > > >     >    4.
    >     >>     > > >     >
    >     >>     > > >     >    Comparatively less time to implement and put it
    > in the
    >     >> hands
    >     >>     > of
    >     >>     > > > the
    >     >>     > > >     >    users.
    >     >>     > > >     >
    >     >>     > > >     > Cons:
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    In the future we may have to reimplement at the
    >     >> nnvm/tvm
    >     >>     > layer,
    >     >>     > > > in case
    >     >>     > > >     >    MXNet moves to the nnvm/tvm backend(assuming it
    > will
    >     >> move).
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    Import/export at nnvm/tvm layer
    >     >>     > > >     >
    >     >>     > > >     > Pros:
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    Less engineering work in case mxnet moves to
    > nnvm/tvm
    >     >>     > > >     >    2.
    >     >>     > > >     >
    >     >>     > > >     >    nnvm/tvm would become a hub to convert to
    > different
    >     >> formats.
    >     >>     > > >     >    3.
    >     >>     > > >     >
    >     >>     > > >     >    nnvm operators are more in parity with mxnet’s
    > gluon
    >     >> APIs this
    >     >>     > > > could be
    >     >>     > > >     >    useful in case Gluon becomes the only standard
    > that
    >     >> MXNet will
    >     >>     > > > support.
    >     >>     > > >     >
    >     >>     > > >     > Cons:
    >     >>     > > >     >
    >     >>     > > >     >    1.
    >     >>     > > >     >
    >     >>     > > >     >    Nascent project with few contributors
    >     >>     > > >     >    2.
    >     >>     > > >     >
    >     >>     > > >     >    Does not support all operators that exist in
    > MXNet
    >     >> Symbolic
    >     >>     > API
    >     >>     > > >     >    3.
    >     >>     > > >     >
    >     >>     > > >     >    No CI Pipeline
    >     >>     > > >     >    4.
    >     >>     > > >     >
    >     >>     > > >     >    Current Apache MXNet project does not use
    > nnvm/tvm
    >     >> backend
    >     >>     > > >     >    5.
    >     >>     > > >     >
    >     >>     > > >     >    mxnet->nnvm/tvm backend needs more testing and
    > user
    >     >> feedback.
    >     >>     > > >     >
    >     >>     > > >     >
    >     >>     > > >     > Any suggestions on both of these approaches? From
    > user's
    >     >>     > > > perspective, this
    >     >>     > > >     > will be an implementation detail that is not
    > exposed.
    >     >>     > > >     >
    >     >>     > > >     > Thanks,
    >     >>     > > >     >
    >     >>     > > >     > Roshani
    >     >>     > > >     >
    >     >>     > > >
    >     >>     > > >
    >     >>     > > >
    >     >>     > > >
    >     >>     > >
    >     >>     >
    >     >>
    >     >>
    >     >>
    >     >>
    >     >
    >
    >
    >
    >
    


Reply via email to