Hi Anirudh, Naveen, Thank you so much for the gentle remainder!
I am not a native speaker and have resulted in the mistake. I would love to say sincere sorry to Pedro. Pedro is working really hard for growing our community and improving our code base. I sincerely apologize for what I have said in a hurry. Let’s work hard together to grow a healthy community! Thanks, Junru On Wed, May 15, 2019 at 11:51 Naveen Swamy <mnnav...@gmail.com> wrote: > Being dismissive and condescending has been exactly what is plaguing this > project. > > I agree the last paragraph sounds very condescending and very dismissive > and it breaks many code of conducts listed. > > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian < > anirudh2...@gmail.com> > wrote: > > > Hi Junru, > > > > Overall, I appreciate the points you made about the proposal. > > > > Having said that, I would like to remind the Apache Code of Conduct : > > https://www.apache.org/foundation/policies/conduct. > > "Be empathetic, welcoming, friendly and patient". > > > > I find your tone condescending. Clearly you understand what he meant from > > the context whether you prefer to call IR in compilers or data-flow in > > distributed systems. You could very well say lets use this terminology to > > have a common understanding instead of saying go learn the basic > concepts. > > Before building a cool brand, its important to build a healthy community. > > > > Anirudh > > > > > > On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com> > > wrote: > > > > > Hi Pedro, > > > > > > I really appreciate that a diligent and talented engineer eagerly wants > > to > > > improve our system, and am very thankful that you have done so much for > > our > > > community. However, I do want to mention some points that I believe I > > > should mention. > > > > > > While I agree with Tianqi that every design has its pros and cons, I > > would > > > love to emphasize that a *good taste* of system design is to optimize > the > > > bottleneck, enhance expressiveness (and usability), i.e. to do what > needs > > > doing, rather than *trivial nits* that are irrelevant to either > > performance > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or > > > unique_ptr, won't affect the overall performance when it comes to deep > > > learning workload, specially when we have an async scheduler that does > > good > > > latency hiding in MXNet - to me, these are not major issues that are > > worth > > > re-designing our entire system. > > > > > > To benefit users - real-world ML practitioners, the most thing I would > > love > > > to mention is that dataflow graph-based representation is increasingly > > > incapable of modern neural networks, because the increasingly appeared > > > structures like arbitrary control flow (w/ continue, break, etc), > > > recursion, type conjunction and disjunction, etc. These issues will be > > our > > > priority to address, which is brought by Relay, which addresses all > these > > > pain points. > > > > > > Another minor thing I would love to humbly mention is that, for sake of > > our > > > brand, it is our responsibility to be professional about terminologies > > when > > > writing an official proposal on Confluence. As one of the numerous > > > examples, the title of the proposal really shocks me for a while, > > something > > > like "operators graph" blah blah so weird. Educate me if I were wrong, > > but > > > compiler community would prefer the term "intermediate representation", > > and > > > distributed system community would prefer "dataflow graph". If you > don't > > > have knowledge in these fields, a better way for efficient > communication > > is > > > to get yourself first familiarize the most basic concepts and then do > > > discussion. This is a way to save your own valuable time as well. > > > > > > Again, thank you so much for your hard work, and hope that we could > work > > > together to win customers in the future :-) > > > > > > Thanks, > > > Junru > > > > > > > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu> > > > wrote: > > > > > > > The core part of the proposal is to move the graph to be much more > > > strongly > > > > typed template class. > > > > I think this is mainly a point of engineering taste, and both sides > > have > > > > pros and cons, let me list them before I share my thoughts on this > > issue: > > > > > > > > - Typed fields certainly enjoy more compile-time type checking, on > the > > > > other hand, it is hard to expose > > > > template of explosive possibilities to frontend languages. > > > > - More type-erased fields provide runtime flexibility to store > > > polymorphic > > > > types as well as extensible attributes for graph optimization > > > > - It is hard to use a virtual class to expose every possible > > attribute > > > > that an operator might have, such as inlining, storage pattern, > > gradient > > > > etc.. > > > > - The nature of supporting a growing set of operator attribute > > > requires a > > > > type-erased attrs field. > > > > - In contrast to your argument(typing is a blocker to features), > > > > type-erased or typed code can both get to the same feature except, > > except > > > > that > > > > typed code gets more compile-time errors while type-erased get some > > of > > > > them in runtime. > > > > - Templatized data structures will likely introduce additional metal > > > > burdens to developers and are not really suitable as a core data > > > structure > > > > - Because they imply an explosive number of possible data > > structures, > > > > while the core data structure should be a single one. > > > > > > > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If > > > MXNet > > > > is a pure C++ project, I might take more of the typed approach. > > > > However, MXNet itself is a project that takes python/scala/clojure > and > > > > other frontend languages. > > > > The introduction of more typing may not align with the original goal > as > > > the > > > > tradeoffs I listed above. > > > > > > > > This proposal is really a drastic change of what NNVM does, as well > as > > > the > > > > optimization passes, and given the scope, in your analogy, "a new > > vehicle > > > > to solve all the problems" > > > > rather than a minor patch. It will take a lot of engineering effort > to > > > > bring in new features and adapting the existing ones. > > > > Because of that, it does merit a discussion about how shall we think > > > about > > > > the future MXNet2.0. > > > > > > > > Technically Relay is a serious candidate. Of course relay, as well as > > its > > > > core, is in C++ but maintains the multi-language first principle, > that > > is > > > > why the example code was in python. > > > > See more related discussion comparing NNVMv1 and relay: > > > > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 > > > > > > > > I think the ideal graph data structure candidate for MXNet2.0 should > > have > > > > natural support for: > > > > - Native support of function, module, and recursions > > > > - Control flows > > > > - The ability of interpolation with multi-language frontend, e.g. > being > > > > able to prototype graph optimizations in python/scala/clojure if > > needed. > > > > > > > > Adding these support needs significant engineering effort, and I do > > hope > > > we > > > > only have to do it once. While I don't want to force any conclusion > > here, > > > > I do think Relay is one such candidate. > > > > > > > > Tianqi > > > > > > > > > > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy < > > > pedro.larroy.li...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hi Tianqi > > > > > > > > > > Thanks for the quick response. > > > > > > > > > > Could you point to examples where graph.h is being exposed which > > would > > > > > not be possible with what I propose? I don't think my proposal is > > > > > having any impact in language bindings, and the way I describe it > > > > > doesn't affect having or not having higher language bindings. > Please > > > > > elaborate so I can understand your concern. Maybe code examples > > where > > > > > the graph attributes are being changed from Python? I don't think > we > > > > > have this on MXNet. This is such a core foundation for MXNet, that > I > > > > > don't think we should compromise on it because other project not > > > > > directly related to MXNet might want to expose some untyped graph > and > > > > > Node attributes. The current status makes maintaining the code > very > > > > > painful and also is preventing desired features such as higher > order > > > > > gradients to be developed. I have heard from you many times how > speed > > > > > is critical for us to innovate in this quickly changing field. > > > > > > > > > > My proposal is limited to the graph and wouldn't change the way > > > > > operators are registered and arguments are processed for operators > > for > > > > > example. > > > > > > > > > > > > > > > Regarding the second point, the documentation about Relay in the > web > > > > > which I found for example: > > > > > > > > > > https://docs.tvm.ai/dev/relay_add_op.html# > > > > > > > > > > Is somebody working on making Imperative::Backward use this API? > this > > > > > would be a big change which I'm not aware of. And using an IR is > of a > > > > > much bigger scope than the change I'm proposing here for example. > > > > > > > > > > I think I'm having difficulty understanding what are the arguments > > > > > here. I'm saying I need to change one piece of my car and what you > > are > > > > > selling me is a new vehicle here? Or your suggestion that we use > > > > > Relay for the graph passes in MXNet? > > > > > > > > > > I would like to see C++ code examples, Python examples are not > > > > > sufficient when we talk about the core MXNet. > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen < > > tqc...@cs.washington.edu> > > > > > wrote: > > > > > > > > > > > > Thanks for the proposal. Let me share some of my thoughts: > > > > > > > > > > > > Specific comments on the proposal > > > > > > ----------------------------------------------- > > > > > > The heavy use of generic in the Graph type was a huge departure > > from > > > > > > type-erased data structure which was presented in the previous > > > design. > > > > > > While we understand the advantage of typed language(more > > compile-time > > > > > > checking) and type-erased types(more dynamism) the heavy use of > > > > > > the template will actually make the project solely C++ focused, > > > making > > > > it > > > > > > hard to expose intermediate(templatized) data structure to > > > > > > other languages like python/scala/clojure. > > > > > > > > > > > > While I fully understand some of the lessons taught in > programming > > > > > > C++(reduce shared_ptr, more typing etc.) > > > > > > We need to think about the context of MXNet project and **the > need > > to > > > > > > support multi-language as a first-class**. > > > > > > Some of the type-erased types are design trade-offs made to > support > > > > these > > > > > > features, and we need to think more > > > > > > carefully instead of just applying "rules for C++" which may > bring > > > > > problems. > > > > > > > > > > > > Future of NNVM > > > > > > ---------------------- > > > > > > Given that this thread touched upon what we should do for better > > > > > > computational graph handling. I would recommend also to take a > look > > > at > > > > > > NNVMv2 -- relay. > > > > > > > > > > > > Relay addresses many of the wish-lists in the proposal already, > > such > > > as > > > > > > operator fusion, high order gradient, offload to hardware, > isolated > > > > > > compilation, deployment on edge and accelerators etc. > > > > > > Relay also address problems not yet being mentioned in the > > proposal, > > > > > > including control flow and dynamic runtime, automatic layout > > > > optimization > > > > > > etc. > > > > > > > > > > > > Tianqi > > > > > > > > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org> > > > wrote: > > > > > > > > > > > > > Hi Pedro, > > > > > > > > > > > > > > Thanks for taking the inititaive. Skimming through the design > > doc, > > > I > > > > > > > didn't see comparison with existing solutions such as relay in > > tvm, > > > > > which > > > > > > > is already a dependency of mxnet already. Could you elaborate > on > > > > > comparison > > > > > > > with existing solutions in the design doc too? > > > > > > > > > > > > > > -sz > > > > > > > > > > > > > > On 2019/05/14 23:49:30, Pedro Larroy < > > pedro.larroy.li...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > Hi dev@ > > > > > > > > > > > > > > > > As a result of my deep dives on the graph machinery I have > > > created > > > > a > > > > > > > > new proposal to improve the operator graph in MXNet. > > > > > > > > > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet > and > > > > having > > > > > > > > a new implementation that we can use to simplify a lot of > code > > > and > > > > do > > > > > > > > powerful graph manipulation and passes such as operator > fusion > > > and > > > > > > > > other optimizations. > > > > > > > > > > > > > > > > As it would be a change with big impact and ramifications, > your > > > > > > > > thoughts and feedback on the document would be highly > > appreciated > > > > so > > > > > > > > we can take potential future interesting use cases: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > > > > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >