[GitHub] [tvm-rfcs] tqchen commented on pull request #89: [RFC] Relax Upstreaming

GitBox Tue, 23 Aug 2022 06:56:31 -0700


tqchen commented on PR #89:
URL: https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1224114184


   Thank you, everyone, for the discussions here. Let us take a step back and 
look at the non-technical parts of the conversation. A lot of our discussions 
come from two goals:
   
   G0: Maintaining a stable evolution solution for some of our common use-cases
   G1: Welcome new improvements, land our technical commitment timely, continue 
to reinvent ourselves, and welcome new community members who have new use cases.
   
   Both goals are very important. G0 ties to our ability to continuously 
support our current use cases. G1 is also essential to our viability as a 
solution, so we can grow as a community and stay competitive in a fast-evolving 
machine learning compilation landscape.
   
   Enabling both has always been an important theme of long-living projects. 
Deep learning frameworks are a common reference to refer back to. Usually, they 
are done in roughly three phases:
   S0: Introduction of a new feature/component as an optional module.
   S1: Evolving the overall solutions to make use of the new component.
   S2: Consider deprecation of some of the existing solutions, or evolve the 
solutions for a consolidation point.
   
   Each stage contains a different level of commitment and would normally 
entail different levels of gating criteria as we look at them.
   
   For example, PyTorch introduced TorchFX as an optional module that supports 
graph tracing and export. It had some overlapping capabilities with 
TorchScript. The PyTorch community is collectively evolving some of the 
compilations (TorchDynamo) to make use of FX. As of now, there is not yet an 
announcement of S2 from the community.
   
   Encouragement of S0 and making it easy to do helps us to enable G1. A too 
high barrier here can discourage community contributions and result in mainline 
lacking the latest features and short-living our competition. This is 
especially important given that the land of machine learning compilation still 
remains open, and the ability to timely support symbolic shape and training 
helps bring in users and contributions who would otherwise turn to alternatives.
   
   G0 is equally important here. In many cases, they boil down to making 
careful and informed decisions regarding evolution (S1 and S2). Additionally, 
making sure that at S0 stage, there is a limited disruptive change to the 
existing infrastructure. Importantly, not every module/feature has to go 
through all stages. And in common practices, the decisions in each stage are 
usually not made at the same time.
   
   We can find examples of S0 cases in TVM as well. For example, USMP was 
currently designed for specific cases like AOT. We welcomed these improvements 
to unblock needs in embedded settings early. Through USMP we found the need of 
tir.alloc_const, which related to evolving on existing infra(S1). As a result, 
we had a more in-depth discussion. Additionally, we are bringing the effort to 
further enable USMP in a broader setting as part of S1. At some point, we might 
consider consolidating all memory allocations as S2 – note that many community 
members are collectively working toward that goal, but we are not yet at a 
point to make such a decision. As another example, we enabled cascaders that 
are specifically designed for micro-NPU, which had some domain overlapping with 
the arithmetic affine module, but nevertheless bought in without consolidation 
because we believed that there is enough interest and maintenance support for 
the module. Finally, the unpacked_api was specifically ena
 bled for extremely low-resource settings, and we enabled S0 level inclusion 
despite some inconsistency with the packed func API.
   
   Of course, we do not want to enable random things in the codebase, which 
ties back to the maintenance overhead concern. One of the questions we want to 
ask here is whether the module contains enough support from the community that 
allows continued maintenance. Additionally, we should consider the fact of 
added engineering support by welcoming additional community members who are 
interested in the needs and would otherwise look elsewhere.
   
   Our overall thought process and decision time point for each stage can be 
different – they should be so we can enable both G0 and G1. Nor do all modules 
have to go through all the stages. 
   
   For S0, we would expect if there are enough champions in the community with 
a self-contained plan. For important features, we would expect, say, more than 
three committers who can champion the module and significant community support 
to maintain them. Additionally, S0 should be made as minimally disruptive (wrt 
to the current infrastructure) as possible. To encourage G1, we can overlook 
some levels of duplications (just like the TorchFX and TorchScript case, USMP, 
and other allocators when they land as S0), considering the additional 
community support we get to maintain them. 
   
   S1 and S2 would involve more careful discussions and coordination with 
greater amounts of details on some of the key points. Likely, they will also 
happen at a different time point so we can make informed decisions.
   
   This particular RFC is at the S0 stage and intentionally made to be so. As 
the RFC stated, there is no proposal to make S1/S2 decisions at this RFC. Many 
of our current discussions are around S1/S2 – the future evolution of the 
system. They are extremely helpful discussions to have to set up the context 
and help us improve the design, but not necessarily decisions we have to make 
immediately. Let us think about the broader community members we can empower 
and bring in through enabling the S0 improvement.
   
   Thank you, everyone, for the discussions so far, and let us work together to 
enable our community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] tqchen commented on pull request #89: [RFC] Relax Upstreaming

Reply via email to