The core part of the proposal is to move the graph to be much more strongly typed template class. I think this is mainly a point of engineering taste, and both sides have pros and cons, let me list them before I share my thoughts on this issue:
- Typed fields certainly enjoy more compile-time type checking, on the other hand, it is hard to expose template of explosive possibilities to frontend languages. - More type-erased fields provide runtime flexibility to store polymorphic types as well as extensible attributes for graph optimization - It is hard to use a virtual class to expose every possible attribute that an operator might have, such as inlining, storage pattern, gradient etc.. - The nature of supporting a growing set of operator attribute requires a type-erased attrs field. - In contrast to your argument(typing is a blocker to features), type-erased or typed code can both get to the same feature except, except that typed code gets more compile-time errors while type-erased get some of them in runtime. - Templatized data structures will likely introduce additional metal burdens to developers and are not really suitable as a core data structure - Because they imply an explosive number of possible data structures, while the core data structure should be a single one. Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet is a pure C++ project, I might take more of the typed approach. However, MXNet itself is a project that takes python/scala/clojure and other frontend languages. The introduction of more typing may not align with the original goal as the tradeoffs I listed above. This proposal is really a drastic change of what NNVM does, as well as the optimization passes, and given the scope, in your analogy, "a new vehicle to solve all the problems" rather than a minor patch. It will take a lot of engineering effort to bring in new features and adapting the existing ones. Because of that, it does merit a discussion about how shall we think about the future MXNet2.0. Technically Relay is a serious candidate. Of course relay, as well as its core, is in C++ but maintains the multi-language first principle, that is why the example code was in python. See more related discussion comparing NNVMv1 and relay: https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 I think the ideal graph data structure candidate for MXNet2.0 should have natural support for: - Native support of function, module, and recursions - Control flows - The ability of interpolation with multi-language frontend, e.g. being able to prototype graph optimizations in python/scala/clojure if needed. Adding these support needs significant engineering effort, and I do hope we only have to do it once. While I don't want to force any conclusion here, I do think Relay is one such candidate. Tianqi On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > Hi Tianqi > > Thanks for the quick response. > > Could you point to examples where graph.h is being exposed which would > not be possible with what I propose? I don't think my proposal is > having any impact in language bindings, and the way I describe it > doesn't affect having or not having higher language bindings. Please > elaborate so I can understand your concern. Maybe code examples where > the graph attributes are being changed from Python? I don't think we > have this on MXNet. This is such a core foundation for MXNet, that I > don't think we should compromise on it because other project not > directly related to MXNet might want to expose some untyped graph and > Node attributes. The current status makes maintaining the code very > painful and also is preventing desired features such as higher order > gradients to be developed. I have heard from you many times how speed > is critical for us to innovate in this quickly changing field. > > My proposal is limited to the graph and wouldn't change the way > operators are registered and arguments are processed for operators for > example. > > > Regarding the second point, the documentation about Relay in the web > which I found for example: > > https://docs.tvm.ai/dev/relay_add_op.html# > > Is somebody working on making Imperative::Backward use this API? this > would be a big change which I'm not aware of. And using an IR is of a > much bigger scope than the change I'm proposing here for example. > > I think I'm having difficulty understanding what are the arguments > here. I'm saying I need to change one piece of my car and what you are > selling me is a new vehicle here? Or your suggestion that we use > Relay for the graph passes in MXNet? > > I would like to see C++ code examples, Python examples are not > sufficient when we talk about the core MXNet. > > Pedro. > > > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <tqc...@cs.washington.edu> > wrote: > > > > Thanks for the proposal. Let me share some of my thoughts: > > > > Specific comments on the proposal > > ----------------------------------------------- > > The heavy use of generic in the Graph type was a huge departure from > > type-erased data structure which was presented in the previous design. > > While we understand the advantage of typed language(more compile-time > > checking) and type-erased types(more dynamism) the heavy use of > > the template will actually make the project solely C++ focused, making it > > hard to expose intermediate(templatized) data structure to > > other languages like python/scala/clojure. > > > > While I fully understand some of the lessons taught in programming > > C++(reduce shared_ptr, more typing etc.) > > We need to think about the context of MXNet project and **the need to > > support multi-language as a first-class**. > > Some of the type-erased types are design trade-offs made to support these > > features, and we need to think more > > carefully instead of just applying "rules for C++" which may bring > problems. > > > > Future of NNVM > > ---------------------- > > Given that this thread touched upon what we should do for better > > computational graph handling. I would recommend also to take a look at > > NNVMv2 -- relay. > > > > Relay addresses many of the wish-lists in the proposal already, such as > > operator fusion, high order gradient, offload to hardware, isolated > > compilation, deployment on edge and accelerators etc. > > Relay also address problems not yet being mentioned in the proposal, > > including control flow and dynamic runtime, automatic layout optimization > > etc. > > > > Tianqi > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org> wrote: > > > > > Hi Pedro, > > > > > > Thanks for taking the inititaive. Skimming through the design doc, I > > > didn't see comparison with existing solutions such as relay in tvm, > which > > > is already a dependency of mxnet already. Could you elaborate on > comparison > > > with existing solutions in the design doc too? > > > > > > -sz > > > > > > On 2019/05/14 23:49:30, Pedro Larroy <pedro.larroy.li...@gmail.com> > > > wrote: > > > > Hi dev@ > > > > > > > > As a result of my deep dives on the graph machinery I have created a > > > > new proposal to improve the operator graph in MXNet. > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet and having > > > > a new implementation that we can use to simplify a lot of code and do > > > > powerful graph manipulation and passes such as operator fusion and > > > > other optimizations. > > > > > > > > As it would be a change with big impact and ramifications, your > > > > thoughts and feedback on the document would be highly appreciated so > > > > we can take potential future interesting use cases: > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > > > > > Pedro. > > > > > > > >