Re: [Proposal] New operator graph for MXNet

Junru Shao Wed, 15 May 2019 00:04:54 -0700

Hi Pedro,

I really appreciate that a diligent and talented engineer eagerly wants to
improve our system, and am very thankful that you have done so much for our
community. However, I do want to mention some points that I believe I
should mention.


While I agree with Tianqi that every design has its pros and cons, I would
love to emphasize that a *good taste* of system design is to optimize the
bottleneck, enhance expressiveness (and usability), i.e. to do what needs
doing, rather than *trivial nits* that are irrelevant to either performance
or expressiveness. Generally speaking, typed or untyped, shared_ptr or
unique_ptr, won't affect the overall performance when it comes to deep
learning workload, specially when we have an async scheduler that does good
latency hiding in MXNet - to me, these are not major issues that are worth
re-designing our entire system.

To benefit users - real-world ML practitioners, the most thing I would love
to mention is that dataflow graph-based representation is increasingly
incapable of modern neural networks, because the increasingly appeared
structures like arbitrary control flow (w/ continue, break, etc),
recursion, type conjunction and disjunction, etc. These issues will be our
priority to address, which is brought by Relay, which addresses all these
pain points.

Another minor thing I would love to humbly mention is that, for sake of our
brand, it is our responsibility to be professional about terminologies when
writing an official proposal on Confluence. As one of the numerous
examples, the title of the proposal really shocks me for a while, something
like "operators graph" blah blah so weird. Educate me if I were wrong, but
compiler community would prefer the term "intermediate representation", and
distributed system community would prefer "dataflow graph". If you don't
have knowledge in these fields, a better way for efficient communication is
to get yourself first familiarize the most basic concepts and then do
discussion. This is a way to save your own valuable time as well.

Again, thank you so much for your hard work, and hope that we could work
together to win customers in the future :-)

Thanks,
Junru


On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu>
wrote:

> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>    template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>    - Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if needed.
>
> Adding these support needs significant engineering effort, and I do hope we
> only have to do it once. While I don't want to force any conclusion here,
> I do think Relay is one such candidate.
>
> Tianqi
>
>
> On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <pedro.larroy.li...@gmail.com
> >
> wrote:
>
> > Hi Tianqi
> >
> > Thanks for the quick response.
> >
> > Could you point to examples where graph.h is being exposed which would
> > not be possible with what I propose? I don't think my proposal is
> > having any impact in language bindings, and the way I describe it
> > doesn't affect having or not having higher language bindings. Please
> > elaborate so I can understand your concern.  Maybe code examples where
> > the graph attributes are being changed from Python?  I don't think we
> > have this on MXNet. This is such a core foundation for MXNet, that I
> > don't think we should compromise on it because other project not
> > directly related to MXNet might want to expose some untyped graph and
> > Node attributes.  The current status makes maintaining the code very
> > painful and also is preventing desired features such as higher order
> > gradients to be developed. I have heard from you many times how speed
> > is critical for us to innovate in this quickly changing field.
> >
> > My proposal is limited to the graph and wouldn't change the way
> > operators are registered and arguments are processed for operators for
> > example.
> >
> >
> > Regarding the second point, the documentation about Relay in the web
> > which I found for example:
> >
> > https://docs.tvm.ai/dev/relay_add_op.html#
> >
> > Is somebody working on making Imperative::Backward use this API? this
> > would be a big change which I'm not aware of. And using an IR is of a
> > much bigger scope than the change I'm proposing here for example.
> >
> > I think I'm having difficulty understanding what are the arguments
> > here. I'm saying I need to change one piece of my car and what you are
> > selling me is a new vehicle here?  Or your suggestion that we use
> > Relay for the graph passes in MXNet?
> >
> > I would like to see C++ code examples, Python examples are not
> > sufficient when we talk about the core MXNet.
> >
> > Pedro.
> >
> >
> >
> >
> >
> >
> > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <tqc...@cs.washington.edu>
> > wrote:
> > >
> > > Thanks for the proposal. Let me share some of my thoughts:
> > >
> > > Specific comments on the proposal
> > > -----------------------------------------------
> > > The heavy use of generic in the Graph type was a huge departure from
> > > type-erased data structure which was presented in the previous design.
> > > While we understand the advantage of typed language(more compile-time
> > > checking) and type-erased types(more dynamism) the heavy use of
> > > the template will actually make the project solely C++ focused, making
> it
> > > hard to expose intermediate(templatized) data structure to
> > > other languages like python/scala/clojure.
> > >
> > > While I fully understand some of the lessons taught in programming
> > > C++(reduce shared_ptr, more typing etc.)
> > > We need to think about the context of MXNet project and **the need to
> > > support multi-language as a first-class**.
> > > Some of the type-erased types are design trade-offs made to support
> these
> > > features, and we need to think more
> > > carefully instead of just applying "rules for C++" which may bring
> > problems.
> > >
> > > Future of NNVM
> > > ----------------------
> > > Given that this thread touched upon what we should do for better
> > > computational graph handling. I would recommend also to take a look at
> > > NNVMv2 -- relay.
> > >
> > > Relay addresses many of the wish-lists in the proposal already, such as
> > > operator fusion, high order gradient, offload to hardware, isolated
> > > compilation, deployment on edge and accelerators etc.
> > > Relay also address problems not yet being mentioned in the proposal,
> > > including control flow and dynamic runtime, automatic layout
> optimization
> > > etc.
> > >
> > > Tianqi
> > >
> > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org> wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > Thanks for taking the inititaive. Skimming through the design doc, I
> > > > didn't see comparison with existing solutions such as relay in tvm,
> > which
> > > > is already a dependency of mxnet already. Could you elaborate on
> > comparison
> > > > with existing solutions in the design doc too?
> > > >
> > > > -sz
> > > >
> > > > On 2019/05/14 23:49:30, Pedro Larroy <pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > > > Hi dev@
> > > > >
> > > > > As a result of my deep dives on the graph machinery I have created
> a
> > > > > new proposal to improve the operator graph in MXNet.
> > > > >
> > > > > This would mean superseding the use of NNVM Graph in MXNet and
> having
> > > > > a new implementation that we can use to simplify a lot of code and
> do
> > > > > powerful graph manipulation and passes such as operator fusion and
> > > > > other optimizations.
> > > > >
> > > > > As it would be a change with big impact and ramifications, your
> > > > > thoughts and feedback on the document would be highly appreciated
> so
> > > > > we can take potential future interesting use cases:
> > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > >
> > > > > Pedro.
> > > > >
> > > >
> >
>

Re: [Proposal] New operator graph for MXNet

Reply via email to