Another data point. While working with a contributor, he/she is asking to get access to the graph and values of the NDArray (me too) to be able to reason more effectively about an enhacements to the operators:
https://github.com/apache/incubator-mxnet/pull/15120 I think gathering in the wiki design constraints with respect to different activities and design proposals using the graph and pain points as the one we are discussing would be a constructive way to move forward, unless you think everything is as good as it gets right now which is what I understand from your responses. Re the shape of vector I know is one line of code to get it, but you can't get the values with a debugger from other points of the code, and that compounds to many other dynamic attributes that you can also fetch, it's a bit like dying from a thousand cuts. In the MXNet codebase we are paying a price for no benefit on flexibility as we always use those attributes, hence my point they should be typed in the graph. Please try to understand my point and help provide solutions or at least a reason why it can't be improved instead of saying there's no problem, there's still three problems in order of importance: debuggability, clarity of definition of data structures and navigability with an IDE which breaks with an untyped field keyed with string. Pedro. Pedro. On Tue, Jun 11, 2019 at 11:07 AM Pedro Larroy <[email protected]> wrote: > > To put a recent specific example and focus the discussion (there are > many as there are attributes), the shapes in the graph are a vector of > Shape set as an attribute using dmlc::any so this makes it very > difficult to debug the shapes when you have a graph object. I would > have it as a typed attribute to the graph, as we always need the > vector of shapes and operate on it while doing shape inference. > > On Tue, Jun 11, 2019 at 10:18 AM Pedro Larroy > <[email protected]> wrote: > > > > Thanks for the good discussion. > > > > I actually wasn't referring particularly to our conversations in > > github with respect of the refactors, but it's nice from you to bring > > them up. And it's ok to disagree in small things, hopefully we can > > align in the big ones. > > > > I understand that for TVM you might have different constraints with > > how dynamic you want to be for mutating the graph and doing quick > > experimentation and research but please try to understand my > > perspectives coming from a software engineering background and helping > > maintain MXNet for thousands of users and teams using it in > > production. Let's also consider how many issues we have open and our > > bandwidth to deal with additional complexity. > > > > To your pros and cons I would like to add and emphasize that currently > > the heavy use of dynamic attributes in the graph using dmlc::any has > > two very negative consequences, at least for MXNet: > > > > 1 - Makes the data structures using dmlc::any almost impossible to > > debug, as they are just binary. > > 2 - Makes the code more difficult to understand because there's no > > declaration in a data structure of the data fields it uses and its > > responsibilities. We are basically shoving all kinds of stuff using > > dmlc::any. > > 3 - You get no help from the IDE to navigate and refactor as another > > consequence. > > > > I would really like you to give me solutions to these problems or at > > least acknowledge them and tell me why do we have to pay those > > tradeoffs instead of just dismissing them as engineering taste. > > > > The more I work with MXNet the more I wish debugging was easier, and > > reading and refactoring the code, and those fields would be declared > > and typed in their corresponding data structures, for MXNet I don't > > think this would affect anything in regards the python bindings since > > they go through the typed C API anyway. > > > > Maybe we can get some inspiration from LLVM as they have bindings for > > many languages to work with the AST and have very clean APIs for the > > compilation steps. It's also OK to have an initial dynamic codebase > > for research and experimentation and then "cure" them into a solid > > maintainable one with more types and more robust... > > > > Pedro. > > > > > > > > > > > > On Fri, May 31, 2019 at 9:31 AM Tianqi Chen <[email protected]> > > wrote: > > > > > > A good infrastructure design has a long way to go and has a profound > > > impact on the project itself. That is why we always want to rethink if > > > the interface can be better done, and think about the next possible > > > infrastructure to make things better, Refactoring is certainly part of it. > > > > > > There are usually two types of refactoring we refers to : > > > 1) The major design change, in terms of class relations, data structures > > > (e.g. numpy support, adding compilation to new hardware) > > > 2) The specific choice of API, programming style(more types or > > > type-erased program) > > > > > > (1) affects the long term support of the project, introduces new features > > > if necessary and need a lot of thoughts into that. I believe the general > > > IR, compilation and numpy support belongs to that category. > > > > > > I would particularly like to talk about (2). > > > Because there is no unified correct answer in software engineering, > > > different developers may prefer different views on a certain problem. > > > Some of them have things to do with the taste developers. The change > > > could favor certain aspect of the project, but not necessarily another > > > part. > > > Refactoring wrt these sometimes does require a more thoughtful > > > conversation and make a reasonable compromise. > > > > > > For example, we have a recent discussion about whether to introduce more > > > typing into the code base, to the extent that the base data structure > > > could be templatized. > > > - The Pros of this approach > > > - It introduces more typing and compile-time error message(instead of > > > runtime checking), could help developers to find problem earlier. > > > - The Cons of the approach: > > > - Having a template in the base data structure causes ABI > > > problem(which code generated by DLL A vs DLL B) and will have potential > > > future issues. > > > - Template sometimes confuses some developers. > > > - For serialization, it is hard to anticipate all kinds of classes and > > > it is easier to have one class(any) that handles polymorphism. > > > - Because of most frontends(python) are dynamic, it is easier to > > > interface them with a type-erased API. > > > > > > As we can see there are pros and cons of bringing in more typing to the > > > change, and there is no unified answer. > > > One good example of a nice infrastructure design trade-off is DLPack > > > https://github.com/dmlc/dlpack/blob/master/include/dlpack/dlpack.h > > > This is a base data structure adopted by MXNet, Pytorch, Chainer, and > > > many other frameworks unanimously. > > > It is a type-erased data structure that erases the data type, and memory > > > allocator from the data structure and is designed to exchange > > > tensor(coming from different memory allocators) across DLL boundaries. > > > As you can see this is a good example of type-erased data structures. > > > > > > When we are having this kind of questions. It is important to have a good > > > conversation. Sometimes we have to make tradeoffs rather than bend > > > everyone-else to our will. This is what open source is about. > > > I would also like to give some examples of conversations and how design > > > decisions are resolved. It comes from the TVM community's recent > > > discussion about VM design. > > > I directly paste the github issue conversation here for the sake of > > > clarity(note that all the conversations are also mirrored to dev@tvm). > > > The background is that the community want to bring a virtual machine that > > > can execute dynamic operations more effectively. > > > > > > - The initial proposal, made by one of the committers gave a detailed > > > design based on Stack VM https://github.com/dmlc/tvm/issues/2810 > > > - As you can see that there are quite some discussions about whether > > > we want to use a different set of design, in this case, a register-based > > > version. > > > - The conversation evolves, and while the community members disagree > > > on some cases, also agrees with each other on the particular tradeoffs. > > > - After some discussions, the committers bring a tradeoff design that > > > tries to consolidate the needs of both sides and this is the final > > > solution being adopted https://github.com/dmlc/tvm/issues/2915 > > > I would like to particularly highlight the fact that: 1) there are > > > disagreements in the development process. 2) developers work together to > > > understand each others' needs and then make consensus on a perhaps better > > > design. > > > > > > There are two other particular conversations between Pedro and myself, > > > which are during his contributions. > > > - https://github.com/dmlc/tvm/pull/3037 In this case, I raised the > > > concern about API consistency, and Pedro brings up a reason why he thinks > > > it is a better idea, I agreed and we merged the PR > > > - https://github.com/dmlc/tvm/pull/3108 In this other case, there are > > > technical reasons for going both sides for the case of MXNet, we have > > > listed pros/cons about both sides and have a constructive conversation. > > > Eventually, I decided to not merge the PR after weighing in all the cases. > > > > > > I believe both are useful conversations, and while Pedro and I disagree > > > sometimes, we do agree on many other cases. The most crucial part is > > > about having a constructive conversation. > > > To summarize, I do refactoring and making things better is certainly > > > important to make the project better. And I do believe it is crucial to > > > think about all the technical consequences and make good tradeoff > > > decisions. > > > Sometimes the decision may not make every developer mostly happy, but a > > > good technical compromise could move the project forward and help the > > > community in general. > > > > > > Tianqi > > > > > > On Fri, May 31, 2019 at 6:26 AM Isabel Drost-Fromm <[email protected]> > > > wrote: > > >> > > >> > > >> > > >> Am 31. Mai 2019 14:13:30 MESZ schrieb Pedro Larroy > > >> <[email protected]>: > > >> > I think Martin does a very good job explaining why > > >> >refactoring, > > >> >reducing developer frustration and internal improvement is a crucial > > >> >productivity multiplier which includes lower cost to ship features, > > >> >less > > >> >bugs and time spent debugging. > > >> > > >> There's one aspect that's special for open source projects: if a project > > >> wants to survive long term, it should make it easy for people to get > > >> started working on the project. In my experience, refactoring and > > >> cleanup play an important role in that. So thanks also for making > > >> recruiting of new contributers better. > > >> > > >> Isabel > > >> -- > > >> This message was sent with K-9 from a mobile device with swipe to type > > >> enabled. I'm sorry for any embarrassing typos that slipped through.
