Re: [Proposal] New operator graph for MXNet

Tianqi Chen Wed, 15 May 2019 12:51:26 -0700

This is a good point. I believe the main question here is not SSA vs
others, but more about CFG vs structured control flow.


SSA is generally equivalent to ANF or dataflow if you ignore the Phi and
CFG blocks. The current relay IR makes use of more structured
control flow so it does not have an explicit CFG(aka goto).

I believe that for deep learning, it is a good idea to get the highest
level information when possible, and structured control-flow block
is certainly more informative(while eliminating the possibility of goto).
Mutation is something that could be handled in Relay, with explicit
annotation.

Most of the current deep learning programs contain parts that need to be
automatically differentiated, which is usually pure, and parts that need to
update parameters, which can be explicitly marked. The center of the
question is: do we try to represent the parts that are pure directly in the
IR, and maintain
 the necessary high-level structures, or do we allow the IR to represent
more arbitrary programs while trying to use analysis(e.g. alias pointer
analysis)
to recover them. I think the former one would be easier given deep learning
programs are already pretty high level.

Now there is also a discussion about again adding CFG to relay to handle
rare cases which do not have to be optimized. But for what I have seen so
far it seems to fits most of the need.

Tianqi

On Wed, May 15, 2019 at 12:01 PM Zach Kimberg <zachary.kimb...@gmail.com>
wrote:

> I would like to raise another option to get back on the topic of changing
> the Operator graph structure. On the page discussing Relay IR [1], it
> discusses mainly the difference between a data flow graph like we use now
> and A-normal [2] which is used in some functional compilers. Is there a
> reason we do not want to use a structure based on Single Static Assignment
> Form (wikipedia explanation [3], lecture note explanation [4]). It is used
> almost universally in the compiler community including in LLVM (clang),
> GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
> it's pervasiveness is that it has proven very effective for analysis and
> transformations when dealing with control flow.
>
> One possible concern is that it might make automatic differentiation more
> difficult [6]. While it certainly is more complicated than a pure
> functional approach, the functional approach requires users to use
> functional programming. Especially with the languages we support now, that
> doesn't seem like a reasonable assumption. Given that the users are already
> introducing the complexity inherent in imperative programming, we have to
> deal with the increased complexity regardless. I think it might be easier
> to have the tools to deal with that rather than attempting to coerce users
> into a different programming paradigm or convert code between paradigms.
> Furthermore, this may become more important if users are increasingly
> making use of control flow like Junru said.
>
> Zach
>
>
> [1] - https://docs.tvm.ai/dev/relay_intro.html
> [2] - https://en.wikipedia.org/wiki/A-normal_form
> [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
> [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
> [5] -
>
> https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
> [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2
>
> On Wed, May 15, 2019 at 11:51 AM Naveen Swamy <mnnav...@gmail.com> wrote:
>
> > Being dismissive and condescending has been exactly what is plaguing this
> > project.
> >
> > I agree the last paragraph sounds very condescending and very dismissive
> > and it breaks many code of conducts listed.
> >
> > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi Junru,
> > >
> > > Overall, I appreciate the points you made about the proposal.
> > >
> > > Having said that, I would like to remind the Apache Code of Conduct :
> > > https://www.apache.org/foundation/policies/conduct.
> > > "Be empathetic, welcoming, friendly and patient".
> > >
> > > I find your tone condescending. Clearly you understand what he meant
> from
> > > the context whether you prefer to call IR in compilers or data-flow in
> > > distributed systems. You could very well say lets use this terminology
> to
> > > have a common understanding instead of saying go learn the basic
> > concepts.
> > > Before building a cool brand, its important to build a healthy
> community.
> > >
> > > Anirudh
> > >
> > >
> > > On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com>
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I really appreciate that a diligent and talented engineer eagerly
> wants
> > > to
> > > > improve our system, and am very thankful that you have done so much
> for
> > > our
> > > > community. However, I do want to mention some points that I believe I
> > > > should mention.
> > > >
> > > > While I agree with Tianqi that every design has its pros and cons, I
> > > would
> > > > love to emphasize that a *good taste* of system design is to optimize
> > the
> > > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> > needs
> > > > doing, rather than *trivial nits* that are irrelevant to either
> > > performance
> > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr
> or
> > > > unique_ptr, won't affect the overall performance when it comes to
> deep
> > > > learning workload, specially when we have an async scheduler that
> does
> > > good
> > > > latency hiding in MXNet - to me, these are not major issues that are
> > > worth
> > > > re-designing our entire system.
> > > >
> > > > To benefit users - real-world ML practitioners, the most thing I
> would
> > > love
> > > > to mention is that dataflow graph-based representation is
> increasingly
> > > > incapable of modern neural networks, because the increasingly
> appeared
> > > > structures like arbitrary control flow (w/ continue, break, etc),
> > > > recursion, type conjunction and disjunction, etc. These issues will
> be
> > > our
> > > > priority to address, which is brought by Relay, which addresses all
> > these
> > > > pain points.
> > > >
> > > > Another minor thing I would love to humbly mention is that, for sake
> of
> > > our
> > > > brand, it is our responsibility to be professional about
> terminologies
> > > when
> > > > writing an official proposal on Confluence. As one of the numerous
> > > > examples, the title of the proposal really shocks me for a while,
> > > something
> > > > like "operators graph" blah blah so weird. Educate me if I were
> wrong,
> > > but
> > > > compiler community would prefer the term "intermediate
> representation",
> > > and
> > > > distributed system community would prefer "dataflow graph". If you
> > don't
> > > > have knowledge in these fields, a better way for efficient
> > communication
> > > is
> > > > to get yourself first familiarize the most basic concepts and then do
> > > > discussion. This is a way to save your own valuable time as well.
> > > >
> > > > Again, thank you so much for your hard work, and hope that we could
> > work
> > > > together to win customers in the future :-)
> > > >
> > > > Thanks,
> > > > Junru
> > > >
> > > >
> > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <
> tqc...@cs.washington.edu>
> > > > wrote:
> > > >
> > > > > The core part of the proposal is to move the graph to be much more
> > > > strongly
> > > > > typed template class.
> > > > > I think this is mainly a point of engineering taste, and both sides
> > > have
> > > > > pros and cons, let me list them before I share my thoughts on this
> > > issue:
> > > > >
> > > > > - Typed fields certainly enjoy more compile-time type checking, on
> > the
> > > > > other hand, it is hard to expose
> > > > >    template of explosive possibilities to frontend languages.
> > > > > - More type-erased fields provide runtime flexibility to store
> > > > polymorphic
> > > > > types as well as extensible attributes for graph optimization
> > > > >   - It is hard to use a virtual class to expose every possible
> > > attribute
> > > > > that an operator might have, such as inlining, storage pattern,
> > > gradient
> > > > > etc..
> > > > >   - The nature of supporting a growing set of operator attribute
> > > > requires a
> > > > > type-erased attrs field.
> > > > > - In contrast to your argument(typing is a blocker to features),
> > > > > type-erased or typed code can both get to the same feature except,
> > > except
> > > > > that
> > > > >   typed code gets more compile-time errors while type-erased get
> some
> > > of
> > > > > them in runtime.
> > > > > - Templatized data structures will likely introduce additional
> metal
> > > > > burdens to developers and are not really suitable as a core data
> > > > structure
> > > > >    - Because they imply an explosive number of possible data
> > > structures,
> > > > > while the core data structure should be a single one.
> > > > >
> > > > > Now my view(as an MXNet PMC member) on typed vs type-erased style:
> If
> > > > MXNet
> > > > > is a pure C++ project, I might take more of the typed approach.
> > > > > However, MXNet itself is a project that takes python/scala/clojure
> > and
> > > > > other frontend languages.
> > > > > The introduction of more typing may not align with the original
> goal
> > as
> > > > the
> > > > > tradeoffs I listed above.
> > > > >
> > > > > This proposal is really a drastic change of what NNVM does, as well
> > as
> > > > the
> > > > > optimization passes, and given the scope, in your analogy, "a new
> > > vehicle
> > > > > to solve all the problems"
> > > > > rather than a minor patch. It will take a lot of engineering effort
> > to
> > > > > bring in new features and adapting the existing ones.
> > > > > Because of that, it does merit a discussion about how shall we
> think
> > > > about
> > > > > the future MXNet2.0.
> > > > >
> > > > > Technically Relay is a serious candidate. Of course relay, as well
> as
> > > its
> > > > > core, is in C++ but maintains the multi-language first principle,
> > that
> > > is
> > > > > why the example code was in python.
> > > > > See more related discussion comparing NNVMv1 and relay:
> > > > >
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
> > > > >
> > > > > I think the ideal graph data structure candidate for MXNet2.0
> should
> > > have
> > > > > natural support for:
> > > > > - Native support of function, module, and recursions
> > > > > - Control flows
> > > > > - The ability of interpolation with multi-language frontend, e.g.
> > being
> > > > > able to prototype graph optimizations in python/scala/clojure if
> > > needed.
> > > > >
> > > > > Adding these support needs significant engineering effort, and I do
> > > hope
> > > > we
> > > > > only have to do it once. While I don't want to force any conclusion
> > > here,
> > > > > I do think Relay is one such candidate.
> > > > >
> > > > > Tianqi
> > > > >
> > > > >
> > > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Tianqi
> > > > > >
> > > > > > Thanks for the quick response.
> > > > > >
> > > > > > Could you point to examples where graph.h is being exposed which
> > > would
> > > > > > not be possible with what I propose? I don't think my proposal is
> > > > > > having any impact in language bindings, and the way I describe it
> > > > > > doesn't affect having or not having higher language bindings.
> > Please
> > > > > > elaborate so I can understand your concern.  Maybe code examples
> > > where
> > > > > > the graph attributes are being changed from Python?  I don't
> think
> > we
> > > > > > have this on MXNet. This is such a core foundation for MXNet,
> that
> > I
> > > > > > don't think we should compromise on it because other project not
> > > > > > directly related to MXNet might want to expose some untyped graph
> > and
> > > > > > Node attributes.  The current status makes maintaining the code
> > very
> > > > > > painful and also is preventing desired features such as higher
> > order
> > > > > > gradients to be developed. I have heard from you many times how
> > speed
> > > > > > is critical for us to innovate in this quickly changing field.
> > > > > >
> > > > > > My proposal is limited to the graph and wouldn't change the way
> > > > > > operators are registered and arguments are processed for
> operators
> > > for
> > > > > > example.
> > > > > >
> > > > > >
> > > > > > Regarding the second point, the documentation about Relay in the
> > web
> > > > > > which I found for example:
> > > > > >
> > > > > > https://docs.tvm.ai/dev/relay_add_op.html#
> > > > > >
> > > > > > Is somebody working on making Imperative::Backward use this API?
> > this
> > > > > > would be a big change which I'm not aware of. And using an IR is
> > of a
> > > > > > much bigger scope than the change I'm proposing here for example.
> > > > > >
> > > > > > I think I'm having difficulty understanding what are the
> arguments
> > > > > > here. I'm saying I need to change one piece of my car and what
> you
> > > are
> > > > > > selling me is a new vehicle here?  Or your suggestion that we use
> > > > > > Relay for the graph passes in MXNet?
> > > > > >
> > > > > > I would like to see C++ code examples, Python examples are not
> > > > > > sufficient when we talk about the core MXNet.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <
> > > tqc...@cs.washington.edu>
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks for the proposal. Let me share some of my thoughts:
> > > > > > >
> > > > > > > Specific comments on the proposal
> > > > > > > -----------------------------------------------
> > > > > > > The heavy use of generic in the Graph type was a huge departure
> > > from
> > > > > > > type-erased data structure which was presented in the previous
> > > > design.
> > > > > > > While we understand the advantage of typed language(more
> > > compile-time
> > > > > > > checking) and type-erased types(more dynamism) the heavy use of
> > > > > > > the template will actually make the project solely C++ focused,
> > > > making
> > > > > it
> > > > > > > hard to expose intermediate(templatized) data structure to
> > > > > > > other languages like python/scala/clojure.
> > > > > > >
> > > > > > > While I fully understand some of the lessons taught in
> > programming
> > > > > > > C++(reduce shared_ptr, more typing etc.)
> > > > > > > We need to think about the context of MXNet project and **the
> > need
> > > to
> > > > > > > support multi-language as a first-class**.
> > > > > > > Some of the type-erased types are design trade-offs made to
> > support
> > > > > these
> > > > > > > features, and we need to think more
> > > > > > > carefully instead of just applying "rules for C++" which may
> > bring
> > > > > > problems.
> > > > > > >
> > > > > > > Future of NNVM
> > > > > > > ----------------------
> > > > > > > Given that this thread touched upon what we should do for
> better
> > > > > > > computational graph handling. I would recommend also to take a
> > look
> > > > at
> > > > > > > NNVMv2 -- relay.
> > > > > > >
> > > > > > > Relay addresses many of the wish-lists in the proposal already,
> > > such
> > > > as
> > > > > > > operator fusion, high order gradient, offload to hardware,
> > isolated
> > > > > > > compilation, deployment on edge and accelerators etc.
> > > > > > > Relay also address problems not yet being mentioned in the
> > > proposal,
> > > > > > > including control flow and dynamic runtime, automatic layout
> > > > > optimization
> > > > > > > etc.
> > > > > > >
> > > > > > > Tianqi
> > > > > > >
> > > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Pedro,
> > > > > > > >
> > > > > > > > Thanks for taking the inititaive. Skimming through the design
> > > doc,
> > > > I
> > > > > > > > didn't see comparison with existing solutions such as relay
> in
> > > tvm,
> > > > > > which
> > > > > > > > is already a dependency of mxnet already. Could you elaborate
> > on
> > > > > > comparison
> > > > > > > > with existing solutions in the design doc too?
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > On 2019/05/14 23:49:30, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > > > > > wrote:
> > > > > > > > > Hi dev@
> > > > > > > > >
> > > > > > > > > As a result of my deep dives on the graph machinery I have
> > > > created
> > > > > a
> > > > > > > > > new proposal to improve the operator graph in MXNet.
> > > > > > > > >
> > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet
> > and
> > > > > having
> > > > > > > > > a new implementation that we can use to simplify a lot of
> > code
> > > > and
> > > > > do
> > > > > > > > > powerful graph manipulation and passes such as operator
> > fusion
> > > > and
> > > > > > > > > other optimizations.
> > > > > > > > >
> > > > > > > > > As it would be a change with big impact and ramifications,
> > your
> > > > > > > > > thoughts and feedback on the document would be highly
> > > appreciated
> > > > > so
> > > > > > > > > we can take potential future interesting use cases:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > > > > > >
> > > > > > > > > Pedro.
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Proposal] New operator graph for MXNet

Reply via email to