Re: Request for suggestions- Supporting onnx in mxnet

Tianqi Chen Thu, 19 Oct 2017 10:20:02 -0700

Since this is a long discussion thread, I will summarize my point and again
clarify some technical considerations here:


Technical Reasoning:

 - Model exchange format like CoreML and ONNX are not lossless and
complete. They are designed to an contain a core set of the
minimum operators to support necessary inference tasks like ResNet, etc.
So you cannot rely on a bi-directional serialization with this format for
all MXNet models.  As a simple example, broadcast add/mul is simply not
supported in onnx.

- Same problem goes for compilation and in-memory IR, a core set of most
interesting primitives are effectively supported.

- Either in the case of supporting exchange format, or in-memory IR, we
need to make the decision on what core set of operators are we interested
in support.  We cannot simply say let us support everything from the
beginning due to the limitations of the exchange format.

- It is crucial for us articulate what is the core set of operators we care
about in MXNet. Either in terms of providing guidelines to the community,
or influence the design of model exchange format them-selfs to move in
favor of MXNet.

- nnvm/top is that initial core set of operators for both compiler support
and exchange purposes. It is modeled under numpy and gluon, under the
supervision of Eric, Me and Mu.  It can be bi-directionally exchanged with
a current mxnet operator without loss of information.

The Effort of Engineering:

- Because nnvm/top is modeled with numpy and gluon, mxnet<-> nnvm/top is
quite easy, and we already have one direction done. I would be very happy
to answer any questions on another. No information loss will happen with
this path.

- mxnet/symbol or nnvm/symbol(they are essentially the same thing with a
bit different op defs) <- onnx is harder. There has been already enough
effort to support onnx 0.1 as Roshani mentioned. Which is contributed by
Zhi Zhang, another Apache MXNet committer. Zhi already provided code to
alleviate this process. Built code on the existing effort would actually
make the problem easier.


Tianqi

On Thu, Oct 19, 2017 at 9:51 AM, Lupesko, Hagay <lupe...@gmail.com> wrote:

> Since there seems to be a difficulty to reach a consensus here, and this
> is a new area, maybe a good compromise would be to contribute this under
> /contrib as experimental, with whatever way Roshani thinks makes sense.
> Once there is code in place, and MXNet users and contributors are able to
> check it out, we can consider future steps.
>
> Does this proposal make sense to folks?
>
> On 10/18/17, 23:01, "Tianqi Chen" <workc...@gmail.com on behalf of
> tqc...@cs.washington.edu> wrote:
>
>     I want to offer one last thing in terms of technical details. I
> mentioned
>     two trends in the deep learning systems. There is one last thing that
> is
>     omitted. How should we build a good deploy end for deep learning
> models.
>
>     There is always a paradox to this problem:
>
>     - On one hand, the deployment end needs to be lightweight and portable.
>     - We want a lot of optimizations (memory layout compute) and feature
>     support, this makes the project big.
>
>     All the existing systems suffer from this problem. The solution is
> simple,
>     separates the optimization part from the actual runtime and compiles
> the
>     things down to a bare metal module. And this is the solution nnvm/top
>     compiler pipeline offer, which I believe will become a standard
> practice of
>     deployment and where all systems go to
>
>     Tianqi
>
>     On Wed, Oct 18, 2017 at 10:03 PM, Tianqi Chen <
> tqc...@cs.washington.edu>
>     wrote:
>
>     > OK, there is some miscommunication in here I guess.  We only need to
> do a
>     > "canonization" step in python API that goes a symbol to symbol
> translation
>     > layer. It can be done in purely in python, and there is no need for
> going
>     > "down" into c++ to do this.
>     >
>     > For example, the current nnvm.from_mxnet API takes Module or Gluon
> module
>     > and get you back nnvm/top graph in python.
>     >
>     > All we are asking for is to decomposing it into
>     >
>     > def mxnet_to_onnx(module):
>     >    nnvm_graph, params = nnvm_from_mxnet(module)
>     >    onnx = nnvm_to_onnx(nnvm_graph, params)
>     >    return onnx
>     >
>     > This allows nnvm_from_mxnet to be reused for other purposes, like
>     > compiling API to deployable modules
>     >
>     > Tianqi
>     >
>     > On Wed, Oct 18, 2017 at 9:55 PM, Lupesko, Hagay <lupe...@gmail.com>
> wrote:
>     >
>     >> Tianqi:
>     >> Thanks for detailing the trends. I fully agree that ONNX is just a
> graph
>     >> serialization format – nothing more, nothing less. I also think we
> all
>     >> agree that this simple mechanism holds lots of value to DL users
> since it
>     >> allows them to move between frameworks easily (e.g. train with
> MXNet,
>     >> deploy on a mobile device with Caffe2, or the other way around).
>     >> As you said, In Memory IR is different than serialization formats
> such as
>     >> ONNX. They are designed to make the runtime execution as efficient
> as
>     >> possible, leveraging software and hardware optimizations. They are
> indeed
>     >> complex, and where the “meat” is.
>     >> (BTW ONNX regards itself as an “IR” format, but not in the same
> sense as
>     >> NNVM).
>     >>
>     >> At the end of the day, Roshani is aiming to deliver a simple
>     >> functionality to MXNet users: (1) take an ONNX file, and load it
> into MXNet
>     >> so you get a graph+weights you can work with (2) Given a trained
> model,
>     >> save it as an ONNX file. Since MXNet users do not interact with NNVM
>     >> directly, but rather interact with MXNet API (MXNet Module), isn’t
> the
>     >> simplest thing to do is just to construct the Module “on the fly”
> using
>     >> MXNet API? Taking the other approach, we will go from the top level
> MXNet
>     >> “load” API, go “down” to NNVM to construct the graph, go back up to
> MXNet
>     >> to expose it as a Module. This seems to complex and does not add any
>     >> benefit. In whatever way we construct the MXNet Module object, NNVM
> will
>     >> always be the underlying in memory IR that is being executed, so
> why not
>     >> take the simpler route?
>     >>
>     >> Hagay
>     >>
>     >> On 10/18/17, 19:42, "Tianqi Chen" <workc...@gmail.com on behalf of
>     >> tqc...@cs.washington.edu> wrote:
>     >>
>     >>     Hi Chris:
>     >>
>     >>     There is no intention to move things away from mxnet. The
> reduction of
>     >>     lines of code by having a better design in general, and
> usually, you
>     >> write
>     >>     less redundant code by benefiting from better design. As I may
> quote:
>     >> "the
>     >>     best design is not achieved not when there is nothing to add,
> but when
>     >>     there is nothing to be taken away."
>     >>
>     >>     MXNet has always benefited from this philosophy and improves
> with the
>     >> new
>     >>     designs and proper modularization. For example, we see such
> reduction
>     >> and
>     >>     convenience happening when migrating from MXNet's legacy op to
> the
>     >>     NNVM's mechanism. The new mechanism now enables things like
> sparse
>     >> aware
>     >>     support and other stuff which would be much harder to support.
>     >>
>     >>     The nnvm/tvm stack comes brings the same benefit(if not more)
> and it
>     >> will
>     >>     only add more features to MXNet itself. Offering more hardware
>     >> backends and
>     >>     optimization, allowing us to write less code and spent less
> time to
>     >>     optimize for each backend by going through TVM
>     >>
>     >>     Tianqi
>     >>
>     >>     On Wed, Oct 18, 2017 at 7:15 PM, Chris Olivier <
> cjolivie...@gmail.com
>     >> >
>     >>     wrote:
>     >>
>     >>     > Reduce code base of mxnet? By increasing scope of the dmlc
> modules?
>     >> Is the
>     >>     > intent to make mxnet a thin language wrapper around a group
> of dmlc
>     >>     > modules?
>     >>     >
>     >>     >
>     >>     > On Wed, Oct 18, 2017 at 6:58 PM Tianqi Chen <
>     >> tqc...@cs.washington.edu>
>     >>     > wrote:
>     >>     >
>     >>     > > To better answer Hagay's question, I would like to dive
> down a
>     >> bit deeper
>     >>     > > on the relation between MXNet, NNVM and model exchange
> format
>     >> like ONNX.
>     >>     > >
>     >>     > > There are two major trends in deep learning systems now:
>     >>     > >
>     >>     > > - Common serializable formats, like ONNX and CoreML, that
> defines
>     >> the
>     >>     > model
>     >>     > > exchange format.
>     >>     > > - The in-memory graph IR for quick optimization and JIT.
> NNVM,
>     >>     > Tensorflow's
>     >>     > > XLA falls into this category.
>     >>     > >
>     >>     > > The exchange formats are great, it only poses a layer of
>     >> conversion,
>     >>     > which
>     >>     > > is good for exchange. The real meat still comes from the
>     >> compilation and
>     >>     > > JIT pipeline you have to offer. For that, we will need an
>     >> in-memory IR,
>     >>     > > because of the cost of constructing, serialize could be
> high for
>     >> the
>     >>     > > exchange formats like protobuf.  And usually, the exchange
>     >> formats are
>     >>     > > designed in a minimalistic fashion, making it less easy to
> extend
>     >> more
>     >>     > > information to support in-depth optimization like automatic
>     >> quantization,
>     >>     > > accelerator support.
>     >>     > >
>     >>     > > The current MXNet relies on NNVM for in-memory IR
> manipulation
>     >> but does
>     >>     > not
>     >>     > > contain a compilation component that compiles to the
> hardware
>     >> backends.
>     >>     > > Doing export to an exchange format and then back into NNVM
> run the
>     >>     > > compilation poses too much burden that JIT compiler could
> pay.
>     >> Using the
>     >>     > > same in-memory graph IR as the compilation stack give much
> more
>     >> advantage
>     >>     > > in terms of this.
>     >>     > >
>     >>     > > The newly introduces nnvm/top and compiler offers in-memory
> graph
>     >>     > > optimization and compilation and offers more hardware
> backend
>     >> directly
>     >>     > via
>     >>     > > TVM. We already see promising results in edge deployments
> with a
>     >> much
>     >>     > lower
>     >>     > > overhead of runtime. We will further benefit quickly from
> more
>     >> graph
>     >>     > > optimizations that it has to offer.
>     >>     > >
>     >>     > > Building support around this new paradigm offers us
> advantage of
>     >> being
>     >>     > > future compatible and takes full benefit of the points I
>     >> mentioned above
>     >>     > >
>     >>     > > Tianqi
>     >>     > >
>     >>     > >
>     >>     > >
>     >>     > > On Wed, Oct 18, 2017 at 4:57 PM, Lupesko, Hagay <
>     >> lupe...@gmail.com>
>     >>     > wrote:
>     >>     > >
>     >>     > > > Roshani – this is an exciting initiative, ONNX support on
> MXNet
>     >> will
>     >>     > > > enable more users to ramp up on MXNet, which is great.
>     >>     > > >
>     >>     > > > Tianqi – a few questions and thoughts about your note:
>     >>     > > > - “More hardware backends to mxnet” – MXNet users get the
> same
>     >> benefit
>     >>     > of
>     >>     > > > HW support implementing ONNX import on top of MXNet
> symbolic,
>     >> right?
>     >>     > > > - “NNVM Compiler now received contributions from AWS, UW
> and
>     >> many other
>     >>     > > > folks in MXNet community.” – agreed it is ramping up, but
> when
>     >> you look
>     >>     > > at
>     >>     > > > the data, it is clear that it is very early on for NNVM.
>     >> Looking at the
>     >>     > > > repo, it has overall 223 commits, 0 releases. Compare it
> to
>     >> MXNet with
>     >>     > > 6136
>     >>     > > > commits and 32 releases. It seems to be still early on for
>     >> NNVM, and
>     >>     > for
>     >>     > > a
>     >>     > > > more reliable initial implementation building the import
> on top
>     >> of
>     >>     > MXNet
>     >>     > > is
>     >>     > > > easier, faster and safer. MXNet has lots of users already
> using
>     >> the
>     >>     > > > Symbolic API which hopefully mean that is a mature API
> that is
>     >> not
>     >>     > likely
>     >>     > > > to have breaking changes or major issues.
>     >>     > > >
>     >>     > > > I’m supportive option 1 proposed by Roshani (building
> serde on
>     >> top of
>     >>     > > > MXNet symbolic), but to do it as an encapsulated
> implementation
>     >> detail,
>     >>     > > so
>     >>     > > > the implementation can be migrated to NNVM or another
>     >> implementation in
>     >>     > > the
>     >>     > > > future, if at that point it seems like the right thing to
> do.
>     >>     > > >
>     >>     > > > Interested in hearing other opinions though…
>     >>     > > >
>     >>     > > > Hagay
>     >>     > > >
>     >>     > > > On 10/18/17, 14:13, "Tianqi Chen" <workc...@gmail.com on
>     >> behalf of
>     >>     > > > tqc...@cs.washington.edu> wrote:
>     >>     > > >
>     >>     > > >     I am strongly recommending going through the
> nnvm/top. One
>     >> major
>     >>     > > > reason in
>     >>     > > >     here is that the support of nnvm/top layer NOT ONLY
> mean
>     >>     > > compatibility
>     >>     > > > of
>     >>     > > >     model format with onnx. These are the major benefits:
>     >>     > > >
>     >>     > > >
>     >>     > > >     - More hardware backends to mxnet, including opencl,
> metal,
>     >>     > Raspberry
>     >>     > > > Pi,
>     >>     > > >     web browser. These things are automatically enabled
> by going
>     >>     > through
>     >>     > > > this
>     >>     > > >     layer. In general, we design nnvm/tvm stack to
> resolve the
>     >>     > challenge
>     >>     > > of
>     >>     > > >     current mxnet's weakness in terms deploying to more
> hardware
>     >>     > > backends.
>     >>     > > >
>     >>     > > >     - More frontend capabilities, nnvm's gluon style IR
> ingests
>     >> now
>     >>     > from
>     >>     > > >     CoreML, ONNX and in future keras. Supporting those
> will
>     >> reduce the
>     >>     > > > amount
>     >>     > > >     of engineering effort needed.
>     >>     > > >
>     >>     > > >     - Future compatibility. We all agree that the future
> being
>     >> migrated
>     >>     > > to
>     >>     > > >     gluon's API. NNVM/top tries to look ahead by directly
>     >> adopting the
>     >>     > > > symbolic
>     >>     > > >     API to be gluon.
>     >>     > > >
>     >>     > > >
>     >>     > > >     I would also like to correct some of the mentioned
> facts
>     >> with
>     >>     > regard
>     >>     > > to
>     >>     > > >     nnvm/tvm stack
>     >>     > > >
>     >>     > > >     1.   Nascent project with few contributors
>     >>     > > >
>     >>     > > >     NNVM Compiler now received contributions from AWS, UW
> and
>     >> many
>     >>     > other
>     >>     > > > folks
>     >>     > > >     in MXNet community. NNVM itself is already being used
> by
>     >> MXNet.
>     >>     > > >     MXNet's internal IR is migrating toward gluon, and its
>     >> final form
>     >>     > > being
>     >>     > > >     nnvm/top
>     >>     > > >
>     >>     > > >     3.   Does not support all operators that exist in
> MXNet
>     >> Symbolic
>     >>     > API
>     >>     > > >
>     >>     > > >     Neither NNVM/top or onnx support all operators that
> exist
>     >> in mxnet
>     >>     > > > symbolic
>     >>     > > >     API. The end goal here is mainly to make nnvm/top onnx
>     >> compatible,
>     >>     > > > which is
>     >>     > > >     a more reasonable goal.
>     >>     > > >
>     >>     > > >     4.  No CI Pipeline and testcases
>     >>     > > >
>     >>     > > >     NNVM already contains a compiler contains unittests
> and ci
>     >> tested
>     >>     > > with
>     >>     > > >     integration  https://github.com/dmlc/nnvm, with a CI
>     >> pipline that
>     >>     > is
>     >>     > > > well
>     >>     > > >     tested on CPU and GPU cases for front-ends.
>     >>     > > >
>     >>     > > >     Tianqi
>     >>     > > >
>     >>     > > >
>     >>     > > >     On Wed, Oct 18, 2017 at 1:41 PM, Roshani Nagmote <
>     >>     > > > roshaninagmo...@gmail.com>
>     >>     > > >     wrote:
>     >>     > > >
>     >>     > > >     > Hi guys,
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > I am working on supporting ONNX <
>     >> https://github.com/onnx/onnx>
>     >>     > > > pre-trained
>     >>     > > >     > models in Apache MXNet and would like to seek your
>     >> opinion on the
>     >>     > > > choice of
>     >>     > > >     > implementation. I also have created a GitHub issue
>     >>     > > >     > <https://github.com/apache/
> incubator-mxnet/issues/8319>.
>     >>     > > Supporting
>     >>     > > > ONNX
>     >>     > > >     > in
>     >>     > > >     > MXNet will enable users to move between frameworks
> with
>     >> their
>     >>     > > > models, this
>     >>     > > >     > will also enable MXNet project to be a part of the
> ONNX
>     >> open
>     >>     > > > standard and
>     >>     > > >     > steer the direction of ONNX.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > For those who don’t know ONNX, ONNX is an open
> source
>     >> format for
>     >>     > AI
>     >>     > > > models
>     >>     > > >     > which enables models to be transferred between
>     >> frameworks. Refer
>     >>     > to
>     >>     > > >     > https://github.com/onnx/onnx for more details.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > To implement the import/export functionality in
> MXNet, I
>     >> propose
>     >>     > to
>     >>     > > > expose
>     >>     > > >     > a MXNet python module “serde”(name taken from
> Apache Hive
>     >>     > project)
>     >>     > > > with the
>     >>     > > >     > following methods supporting different formats:
>     >>     > > >     >
>     >>     > > >     > sym, params = mxnet.serde.import(other_format_file,
>     >>     > > > other_format=‘onnx’)
>     >>     > > >     >
>     >>     > > >     > other_format_file =  mxnet.serde.export(mxnet_sym,
>     >> mxnet_params,
>     >>     > > > ‘onnx’)
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > The implementation under the hood can be done in
> two ways:
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > 1) Implement at the MXNet layer by parsing the ONNX
>     >> model(in
>     >>     > > protobuf
>     >>     > > >     > format) and turn into MXNet Symbolic operators and
> build
>     >> MXNet
>     >>     > > model
>     >>     > > >     > directly. Similarly, I can convert the MXNet model
> to
>     >> ONNX format
>     >>     > > at
>     >>     > > > this
>     >>     > > >     > layer.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > 2) The DMLC community has released the nnvm/tvm
> complier
>     >> and an
>     >>     > > >     > intermediate representation of the models, refer:
>     >>     > > >     > http://www.tvmlang.org/2017/
> 10/06/nnvm/tvm-compiler-
>     >>     > > > announcement.html
>     >>     > > >     > <http://www.tvmlang.org/2017/10/06/nnvm-compiler-
>     >>     > announcement.html
>     >>     > > >
>     >>     > > >     >
>     >>     > > >     > Based on the conversation on the GitHub issue
>     >>     > > >     > <https://github.com/apache/
> incubator-mxnet/issues/8319> I
>     >>     > opened,
>     >>     > > Mu
>     >>     > > >     > mentioned that MXNet would use nnvm/tvm as the
> backend in
>     >> the
>     >>     > > future.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > We could hook into this layer to implement the
>     >> import/export
>     >>     > > > functionality.
>     >>     > > >     > nnvm/tvm has ONNX 0.1 version import implemented.
>     >>     > > >     >
>     >>     > > >     > For import,
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    I will need to enhance nnvm/tvm’s importer to
> support
>     >> ONNX 0.2
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    Implement nnvm/tvm->mxnet symbolic operators.
>     >>     > > >     >
>     >>     > > >     > For export:
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    mxnet->nnvm/tvm ( nnvm/tvm provides this
> implementation
>     >>     > already)
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    I will need to Implement nnvm/tvm>onnx.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > These are the pros and cons I see in the above
> approaches:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Import/export at mxnet layer
>     >>     > > >     >
>     >>     > > >     > Pros:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Stable APIs currently used by users.
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    Larger Apache MXNet community of contributors.
>     >>     > > >     >    3.
>     >>     > > >     >
>     >>     > > >     >    CI pipeline to catch bugs.
>     >>     > > >     >    4.
>     >>     > > >     >
>     >>     > > >     >    Comparatively less time to implement and put it
> in the
>     >> hands
>     >>     > of
>     >>     > > > the
>     >>     > > >     >    users.
>     >>     > > >     >
>     >>     > > >     > Cons:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    In the future we may have to reimplement at the
>     >> nnvm/tvm
>     >>     > layer,
>     >>     > > > in case
>     >>     > > >     >    MXNet moves to the nnvm/tvm backend(assuming it
> will
>     >> move).
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Import/export at nnvm/tvm layer
>     >>     > > >     >
>     >>     > > >     > Pros:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Less engineering work in case mxnet moves to
> nnvm/tvm
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    nnvm/tvm would become a hub to convert to
> different
>     >> formats.
>     >>     > > >     >    3.
>     >>     > > >     >
>     >>     > > >     >    nnvm operators are more in parity with mxnet’s
> gluon
>     >> APIs this
>     >>     > > > could be
>     >>     > > >     >    useful in case Gluon becomes the only standard
> that
>     >> MXNet will
>     >>     > > > support.
>     >>     > > >     >
>     >>     > > >     > Cons:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Nascent project with few contributors
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    Does not support all operators that exist in
> MXNet
>     >> Symbolic
>     >>     > API
>     >>     > > >     >    3.
>     >>     > > >     >
>     >>     > > >     >    No CI Pipeline
>     >>     > > >     >    4.
>     >>     > > >     >
>     >>     > > >     >    Current Apache MXNet project does not use
> nnvm/tvm
>     >> backend
>     >>     > > >     >    5.
>     >>     > > >     >
>     >>     > > >     >    mxnet->nnvm/tvm backend needs more testing and
> user
>     >> feedback.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > Any suggestions on both of these approaches? From
> user's
>     >>     > > > perspective, this
>     >>     > > >     > will be an implementation detail that is not
> exposed.
>     >>     > > >     >
>     >>     > > >     > Thanks,
>     >>     > > >     >
>     >>     > > >     > Roshani
>     >>     > > >     >
>     >>     > > >
>     >>     > > >
>     >>     > > >
>     >>     > > >
>     >>     > >
>     >>     >
>     >>
>     >>
>     >>
>     >>
>     >
>
>
>
>

Re: Request for suggestions- Supporting onnx in mxnet

Reply via email to