Re: [Proposal] New operator graph for MXNet

Junru Shao Wed, 15 May 2019 12:12:08 -0700

Hi Anirudh, Naveen,

Thank you so much for the gentle remainder!


I am not a native speaker and have resulted in the mistake. I would love to
say sincere sorry to Pedro. Pedro is working really hard for growing our
community and improving our code base. I sincerely apologize for what I
have said in a hurry.

Let’s work hard together to grow a healthy community!

Thanks,
Junru

On Wed, May 15, 2019 at 11:51 Naveen Swamy <mnnav...@gmail.com> wrote:

> Being dismissive and condescending has been exactly what is plaguing this
> project.
>
> I agree the last paragraph sounds very condescending and very dismissive
> and it breaks many code of conducts listed.
>
> On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > Hi Junru,
> >
> > Overall, I appreciate the points you made about the proposal.
> >
> > Having said that, I would like to remind the Apache Code of Conduct :
> > https://www.apache.org/foundation/policies/conduct.
> > "Be empathetic, welcoming, friendly and patient".
> >
> > I find your tone condescending. Clearly you understand what he meant from
> > the context whether you prefer to call IR in compilers or data-flow in
> > distributed systems. You could very well say lets use this terminology to
> > have a common understanding instead of saying go learn the basic
> concepts.
> > Before building a cool brand, its important to build a healthy community.
> >
> > Anirudh
> >
> >
> > On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com>
> > wrote:
> >
> > > Hi Pedro,
> > >
> > > I really appreciate that a diligent and talented engineer eagerly wants
> > to
> > > improve our system, and am very thankful that you have done so much for
> > our
> > > community. However, I do want to mention some points that I believe I
> > > should mention.
> > >
> > > While I agree with Tianqi that every design has its pros and cons, I
> > would
> > > love to emphasize that a *good taste* of system design is to optimize
> the
> > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> needs
> > > doing, rather than *trivial nits* that are irrelevant to either
> > performance
> > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > > unique_ptr, won't affect the overall performance when it comes to deep
> > > learning workload, specially when we have an async scheduler that does
> > good
> > > latency hiding in MXNet - to me, these are not major issues that are
> > worth
> > > re-designing our entire system.
> > >
> > > To benefit users - real-world ML practitioners, the most thing I would
> > love
> > > to mention is that dataflow graph-based representation is increasingly
> > > incapable of modern neural networks, because the increasingly appeared
> > > structures like arbitrary control flow (w/ continue, break, etc),
> > > recursion, type conjunction and disjunction, etc. These issues will be
> > our
> > > priority to address, which is brought by Relay, which addresses all
> these
> > > pain points.
> > >
> > > Another minor thing I would love to humbly mention is that, for sake of
> > our
> > > brand, it is our responsibility to be professional about terminologies
> > when
> > > writing an official proposal on Confluence. As one of the numerous
> > > examples, the title of the proposal really shocks me for a while,
> > something
> > > like "operators graph" blah blah so weird. Educate me if I were wrong,
> > but
> > > compiler community would prefer the term "intermediate representation",
> > and
> > > distributed system community would prefer "dataflow graph". If you
> don't
> > > have knowledge in these fields, a better way for efficient
> communication
> > is
> > > to get yourself first familiarize the most basic concepts and then do
> > > discussion. This is a way to save your own valuable time as well.
> > >
> > > Again, thank you so much for your hard work, and hope that we could
> work
> > > together to win customers in the future :-)
> > >
> > > Thanks,
> > > Junru
> > >
> > >
> > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu>
> > > wrote:
> > >
> > > > The core part of the proposal is to move the graph to be much more
> > > strongly
> > > > typed template class.
> > > > I think this is mainly a point of engineering taste, and both sides
> > have
> > > > pros and cons, let me list them before I share my thoughts on this
> > issue:
> > > >
> > > > - Typed fields certainly enjoy more compile-time type checking, on
> the
> > > > other hand, it is hard to expose
> > > >    template of explosive possibilities to frontend languages.
> > > > - More type-erased fields provide runtime flexibility to store
> > > polymorphic
> > > > types as well as extensible attributes for graph optimization
> > > >   - It is hard to use a virtual class to expose every possible
> > attribute
> > > > that an operator might have, such as inlining, storage pattern,
> > gradient
> > > > etc..
> > > >   - The nature of supporting a growing set of operator attribute
> > > requires a
> > > > type-erased attrs field.
> > > > - In contrast to your argument(typing is a blocker to features),
> > > > type-erased or typed code can both get to the same feature except,
> > except
> > > > that
> > > >   typed code gets more compile-time errors while type-erased get some
> > of
> > > > them in runtime.
> > > > - Templatized data structures will likely introduce additional metal
> > > > burdens to developers and are not really suitable as a core data
> > > structure
> > > >    - Because they imply an explosive number of possible data
> > structures,
> > > > while the core data structure should be a single one.
> > > >
> > > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> > > MXNet
> > > > is a pure C++ project, I might take more of the typed approach.
> > > > However, MXNet itself is a project that takes python/scala/clojure
> and
> > > > other frontend languages.
> > > > The introduction of more typing may not align with the original goal
> as
> > > the
> > > > tradeoffs I listed above.
> > > >
> > > > This proposal is really a drastic change of what NNVM does, as well
> as
> > > the
> > > > optimization passes, and given the scope, in your analogy, "a new
> > vehicle
> > > > to solve all the problems"
> > > > rather than a minor patch. It will take a lot of engineering effort
> to
> > > > bring in new features and adapting the existing ones.
> > > > Because of that, it does merit a discussion about how shall we think
> > > about
> > > > the future MXNet2.0.
> > > >
> > > > Technically Relay is a serious candidate. Of course relay, as well as
> > its
> > > > core, is in C++ but maintains the multi-language first principle,
> that
> > is
> > > > why the example code was in python.
> > > > See more related discussion comparing NNVMv1 and relay:
> > > > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
> > > >
> > > > I think the ideal graph data structure candidate for MXNet2.0 should
> > have
> > > > natural support for:
> > > > - Native support of function, module, and recursions
> > > > - Control flows
> > > > - The ability of interpolation with multi-language frontend, e.g.
> being
> > > > able to prototype graph optimizations in python/scala/clojure if
> > needed.
> > > >
> > > > Adding these support needs significant engineering effort, and I do
> > hope
> > > we
> > > > only have to do it once. While I don't want to force any conclusion
> > here,
> > > > I do think Relay is one such candidate.
> > > >
> > > > Tianqi
> > > >
> > > >
> > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Tianqi
> > > > >
> > > > > Thanks for the quick response.
> > > > >
> > > > > Could you point to examples where graph.h is being exposed which
> > would
> > > > > not be possible with what I propose? I don't think my proposal is
> > > > > having any impact in language bindings, and the way I describe it
> > > > > doesn't affect having or not having higher language bindings.
> Please
> > > > > elaborate so I can understand your concern.  Maybe code examples
> > where
> > > > > the graph attributes are being changed from Python?  I don't think
> we
> > > > > have this on MXNet. This is such a core foundation for MXNet, that
> I
> > > > > don't think we should compromise on it because other project not
> > > > > directly related to MXNet might want to expose some untyped graph
> and
> > > > > Node attributes.  The current status makes maintaining the code
> very
> > > > > painful and also is preventing desired features such as higher
> order
> > > > > gradients to be developed. I have heard from you many times how
> speed
> > > > > is critical for us to innovate in this quickly changing field.
> > > > >
> > > > > My proposal is limited to the graph and wouldn't change the way
> > > > > operators are registered and arguments are processed for operators
> > for
> > > > > example.
> > > > >
> > > > >
> > > > > Regarding the second point, the documentation about Relay in the
> web
> > > > > which I found for example:
> > > > >
> > > > > https://docs.tvm.ai/dev/relay_add_op.html#
> > > > >
> > > > > Is somebody working on making Imperative::Backward use this API?
> this
> > > > > would be a big change which I'm not aware of. And using an IR is
> of a
> > > > > much bigger scope than the change I'm proposing here for example.
> > > > >
> > > > > I think I'm having difficulty understanding what are the arguments
> > > > > here. I'm saying I need to change one piece of my car and what you
> > are
> > > > > selling me is a new vehicle here?  Or your suggestion that we use
> > > > > Relay for the graph passes in MXNet?
> > > > >
> > > > > I would like to see C++ code examples, Python examples are not
> > > > > sufficient when we talk about the core MXNet.
> > > > >
> > > > > Pedro.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <
> > tqc...@cs.washington.edu>
> > > > > wrote:
> > > > > >
> > > > > > Thanks for the proposal. Let me share some of my thoughts:
> > > > > >
> > > > > > Specific comments on the proposal
> > > > > > -----------------------------------------------
> > > > > > The heavy use of generic in the Graph type was a huge departure
> > from
> > > > > > type-erased data structure which was presented in the previous
> > > design.
> > > > > > While we understand the advantage of typed language(more
> > compile-time
> > > > > > checking) and type-erased types(more dynamism) the heavy use of
> > > > > > the template will actually make the project solely C++ focused,
> > > making
> > > > it
> > > > > > hard to expose intermediate(templatized) data structure to
> > > > > > other languages like python/scala/clojure.
> > > > > >
> > > > > > While I fully understand some of the lessons taught in
> programming
> > > > > > C++(reduce shared_ptr, more typing etc.)
> > > > > > We need to think about the context of MXNet project and **the
> need
> > to
> > > > > > support multi-language as a first-class**.
> > > > > > Some of the type-erased types are design trade-offs made to
> support
> > > > these
> > > > > > features, and we need to think more
> > > > > > carefully instead of just applying "rules for C++" which may
> bring
> > > > > problems.
> > > > > >
> > > > > > Future of NNVM
> > > > > > ----------------------
> > > > > > Given that this thread touched upon what we should do for better
> > > > > > computational graph handling. I would recommend also to take a
> look
> > > at
> > > > > > NNVMv2 -- relay.
> > > > > >
> > > > > > Relay addresses many of the wish-lists in the proposal already,
> > such
> > > as
> > > > > > operator fusion, high order gradient, offload to hardware,
> isolated
> > > > > > compilation, deployment on edge and accelerators etc.
> > > > > > Relay also address problems not yet being mentioned in the
> > proposal,
> > > > > > including control flow and dynamic runtime, automatic layout
> > > > optimization
> > > > > > etc.
> > > > > >
> > > > > > Tianqi
> > > > > >
> > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Hi Pedro,
> > > > > > >
> > > > > > > Thanks for taking the inititaive. Skimming through the design
> > doc,
> > > I
> > > > > > > didn't see comparison with existing solutions such as relay in
> > tvm,
> > > > > which
> > > > > > > is already a dependency of mxnet already. Could you elaborate
> on
> > > > > comparison
> > > > > > > with existing solutions in the design doc too?
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > On 2019/05/14 23:49:30, Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > > > > > wrote:
> > > > > > > > Hi dev@
> > > > > > > >
> > > > > > > > As a result of my deep dives on the graph machinery I have
> > > created
> > > > a
> > > > > > > > new proposal to improve the operator graph in MXNet.
> > > > > > > >
> > > > > > > > This would mean superseding the use of NNVM Graph in MXNet
> and
> > > > having
> > > > > > > > a new implementation that we can use to simplify a lot of
> code
> > > and
> > > > do
> > > > > > > > powerful graph manipulation and passes such as operator
> fusion
> > > and
> > > > > > > > other optimizations.
> > > > > > > >
> > > > > > > > As it would be a change with big impact and ramifications,
> your
> > > > > > > > thoughts and feedback on the document would be highly
> > appreciated
> > > > so
> > > > > > > > we can take potential future interesting use cases:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > > > > >
> > > > > > > > Pedro.
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Proposal] New operator graph for MXNet

Reply via email to