Re: [Proposal] New operator graph for MXNet

Naveen Swamy Wed, 15 May 2019 11:52:05 -0700

Being dismissive and condescending has been exactly what is plaguing this
project.


I agree the last paragraph sounds very condescending and very dismissive
and it breaks many code of conducts listed.

On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <anirudh2...@gmail.com>
wrote:

> Hi Junru,
>
> Overall, I appreciate the points you made about the proposal.
>
> Having said that, I would like to remind the Apache Code of Conduct :
> https://www.apache.org/foundation/policies/conduct.
> "Be empathetic, welcoming, friendly and patient".
>
> I find your tone condescending. Clearly you understand what he meant from
> the context whether you prefer to call IR in compilers or data-flow in
> distributed systems. You could very well say lets use this terminology to
> have a common understanding instead of saying go learn the basic concepts.
> Before building a cool brand, its important to build a healthy community.
>
> Anirudh
>
>
> On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com>
> wrote:
>
> > Hi Pedro,
> >
> > I really appreciate that a diligent and talented engineer eagerly wants
> to
> > improve our system, and am very thankful that you have done so much for
> our
> > community. However, I do want to mention some points that I believe I
> > should mention.
> >
> > While I agree with Tianqi that every design has its pros and cons, I
> would
> > love to emphasize that a *good taste* of system design is to optimize the
> > bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> > doing, rather than *trivial nits* that are irrelevant to either
> performance
> > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > unique_ptr, won't affect the overall performance when it comes to deep
> > learning workload, specially when we have an async scheduler that does
> good
> > latency hiding in MXNet - to me, these are not major issues that are
> worth
> > re-designing our entire system.
> >
> > To benefit users - real-world ML practitioners, the most thing I would
> love
> > to mention is that dataflow graph-based representation is increasingly
> > incapable of modern neural networks, because the increasingly appeared
> > structures like arbitrary control flow (w/ continue, break, etc),
> > recursion, type conjunction and disjunction, etc. These issues will be
> our
> > priority to address, which is brought by Relay, which addresses all these
> > pain points.
> >
> > Another minor thing I would love to humbly mention is that, for sake of
> our
> > brand, it is our responsibility to be professional about terminologies
> when
> > writing an official proposal on Confluence. As one of the numerous
> > examples, the title of the proposal really shocks me for a while,
> something
> > like "operators graph" blah blah so weird. Educate me if I were wrong,
> but
> > compiler community would prefer the term "intermediate representation",
> and
> > distributed system community would prefer "dataflow graph". If you don't
> > have knowledge in these fields, a better way for efficient communication
> is
> > to get yourself first familiarize the most basic concepts and then do
> > discussion. This is a way to save your own valuable time as well.
> >
> > Again, thank you so much for your hard work, and hope that we could work
> > together to win customers in the future :-)
> >
> > Thanks,
> > Junru
> >
> >
> > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu>
> > wrote:
> >
> > > The core part of the proposal is to move the graph to be much more
> > strongly
> > > typed template class.
> > > I think this is mainly a point of engineering taste, and both sides
> have
> > > pros and cons, let me list them before I share my thoughts on this
> issue:
> > >
> > > - Typed fields certainly enjoy more compile-time type checking, on the
> > > other hand, it is hard to expose
> > >    template of explosive possibilities to frontend languages.
> > > - More type-erased fields provide runtime flexibility to store
> > polymorphic
> > > types as well as extensible attributes for graph optimization
> > >   - It is hard to use a virtual class to expose every possible
> attribute
> > > that an operator might have, such as inlining, storage pattern,
> gradient
> > > etc..
> > >   - The nature of supporting a growing set of operator attribute
> > requires a
> > > type-erased attrs field.
> > > - In contrast to your argument(typing is a blocker to features),
> > > type-erased or typed code can both get to the same feature except,
> except
> > > that
> > >   typed code gets more compile-time errors while type-erased get some
> of
> > > them in runtime.
> > > - Templatized data structures will likely introduce additional metal
> > > burdens to developers and are not really suitable as a core data
> > structure
> > >    - Because they imply an explosive number of possible data
> structures,
> > > while the core data structure should be a single one.
> > >
> > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> > MXNet
> > > is a pure C++ project, I might take more of the typed approach.
> > > However, MXNet itself is a project that takes python/scala/clojure and
> > > other frontend languages.
> > > The introduction of more typing may not align with the original goal as
> > the
> > > tradeoffs I listed above.
> > >
> > > This proposal is really a drastic change of what NNVM does, as well as
> > the
> > > optimization passes, and given the scope, in your analogy, "a new
> vehicle
> > > to solve all the problems"
> > > rather than a minor patch. It will take a lot of engineering effort to
> > > bring in new features and adapting the existing ones.
> > > Because of that, it does merit a discussion about how shall we think
> > about
> > > the future MXNet2.0.
> > >
> > > Technically Relay is a serious candidate. Of course relay, as well as
> its
> > > core, is in C++ but maintains the multi-language first principle, that
> is
> > > why the example code was in python.
> > > See more related discussion comparing NNVMv1 and relay:
> > > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
> > >
> > > I think the ideal graph data structure candidate for MXNet2.0 should
> have
> > > natural support for:
> > > - Native support of function, module, and recursions
> > > - Control flows
> > > - The ability of interpolation with multi-language frontend, e.g. being
> > > able to prototype graph optimizations in python/scala/clojure if
> needed.
> > >
> > > Adding these support needs significant engineering effort, and I do
> hope
> > we
> > > only have to do it once. While I don't want to force any conclusion
> here,
> > > I do think Relay is one such candidate.
> > >
> > > Tianqi
> > >
> > >
> > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Tianqi
> > > >
> > > > Thanks for the quick response.
> > > >
> > > > Could you point to examples where graph.h is being exposed which
> would
> > > > not be possible with what I propose? I don't think my proposal is
> > > > having any impact in language bindings, and the way I describe it
> > > > doesn't affect having or not having higher language bindings. Please
> > > > elaborate so I can understand your concern.  Maybe code examples
> where
> > > > the graph attributes are being changed from Python?  I don't think we
> > > > have this on MXNet. This is such a core foundation for MXNet, that I
> > > > don't think we should compromise on it because other project not
> > > > directly related to MXNet might want to expose some untyped graph and
> > > > Node attributes.  The current status makes maintaining the code very
> > > > painful and also is preventing desired features such as higher order
> > > > gradients to be developed. I have heard from you many times how speed
> > > > is critical for us to innovate in this quickly changing field.
> > > >
> > > > My proposal is limited to the graph and wouldn't change the way
> > > > operators are registered and arguments are processed for operators
> for
> > > > example.
> > > >
> > > >
> > > > Regarding the second point, the documentation about Relay in the web
> > > > which I found for example:
> > > >
> > > > https://docs.tvm.ai/dev/relay_add_op.html#
> > > >
> > > > Is somebody working on making Imperative::Backward use this API? this
> > > > would be a big change which I'm not aware of. And using an IR is of a
> > > > much bigger scope than the change I'm proposing here for example.
> > > >
> > > > I think I'm having difficulty understanding what are the arguments
> > > > here. I'm saying I need to change one piece of my car and what you
> are
> > > > selling me is a new vehicle here?  Or your suggestion that we use
> > > > Relay for the graph passes in MXNet?
> > > >
> > > > I would like to see C++ code examples, Python examples are not
> > > > sufficient when we talk about the core MXNet.
> > > >
> > > > Pedro.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <
> tqc...@cs.washington.edu>
> > > > wrote:
> > > > >
> > > > > Thanks for the proposal. Let me share some of my thoughts:
> > > > >
> > > > > Specific comments on the proposal
> > > > > -----------------------------------------------
> > > > > The heavy use of generic in the Graph type was a huge departure
> from
> > > > > type-erased data structure which was presented in the previous
> > design.
> > > > > While we understand the advantage of typed language(more
> compile-time
> > > > > checking) and type-erased types(more dynamism) the heavy use of
> > > > > the template will actually make the project solely C++ focused,
> > making
> > > it
> > > > > hard to expose intermediate(templatized) data structure to
> > > > > other languages like python/scala/clojure.
> > > > >
> > > > > While I fully understand some of the lessons taught in programming
> > > > > C++(reduce shared_ptr, more typing etc.)
> > > > > We need to think about the context of MXNet project and **the need
> to
> > > > > support multi-language as a first-class**.
> > > > > Some of the type-erased types are design trade-offs made to support
> > > these
> > > > > features, and we need to think more
> > > > > carefully instead of just applying "rules for C++" which may bring
> > > > problems.
> > > > >
> > > > > Future of NNVM
> > > > > ----------------------
> > > > > Given that this thread touched upon what we should do for better
> > > > > computational graph handling. I would recommend also to take a look
> > at
> > > > > NNVMv2 -- relay.
> > > > >
> > > > > Relay addresses many of the wish-lists in the proposal already,
> such
> > as
> > > > > operator fusion, high order gradient, offload to hardware, isolated
> > > > > compilation, deployment on edge and accelerators etc.
> > > > > Relay also address problems not yet being mentioned in the
> proposal,
> > > > > including control flow and dynamic runtime, automatic layout
> > > optimization
> > > > > etc.
> > > > >
> > > > > Tianqi
> > > > >
> > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi Pedro,
> > > > > >
> > > > > > Thanks for taking the inititaive. Skimming through the design
> doc,
> > I
> > > > > > didn't see comparison with existing solutions such as relay in
> tvm,
> > > > which
> > > > > > is already a dependency of mxnet already. Could you elaborate on
> > > > comparison
> > > > > > with existing solutions in the design doc too?
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > On 2019/05/14 23:49:30, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > > > > > wrote:
> > > > > > > Hi dev@
> > > > > > >
> > > > > > > As a result of my deep dives on the graph machinery I have
> > created
> > > a
> > > > > > > new proposal to improve the operator graph in MXNet.
> > > > > > >
> > > > > > > This would mean superseding the use of NNVM Graph in MXNet and
> > > having
> > > > > > > a new implementation that we can use to simplify a lot of code
> > and
> > > do
> > > > > > > powerful graph manipulation and passes such as operator fusion
> > and
> > > > > > > other optimizations.
> > > > > > >
> > > > > > > As it would be a change with big impact and ramifications, your
> > > > > > > thoughts and feedback on the document would be highly
> appreciated
> > > so
> > > > > > > we can take potential future interesting use cases:
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [Proposal] New operator graph for MXNet

Reply via email to