junrushao1994 commented on a change in pull request #22:
URL: https://github.com/apache/tvm-rfcs/pull/22#discussion_r704768196



##########
File path: rfcs/0022-tir-non-scalar-constants.md
##########
@@ -0,0 +1,107 @@
+
+- Feature Name: tir_non_scalar_constants
+- Start Date: 2021-06-01
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/22
+- GitHub Issue: TBD
+
+# 1. Summary
+
+This RFC proposes how non-scalar constants could be represented in TIR and 
used by passes in the lowering process.
+
+# 2. Motivation 
+
+Currently, the non-scalar constants could be represented in Relay 
(relay.Constant) to be used by relay passes but not in TIR. Therefore, when 
performing lowering using TIR passes, we have to maintain a side-channel of 
tir::Var to constant non-scalar data mapping to perform transformations that 
could use the knowledge where some of the data are constants.
+
+Few example scenarios as further motivation :
+
+## Weight compression
+
+When lowering for accelerators (E.g. : [Arm(R) Ethos(TM)-U 
NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need 
to get tiled to co-optimize performance and memory utilization. Such tiling 
patterns create slices of weights that need compressing that will end up with 
varying sizes. Therefore, the knowledge of some tir::Vars refer to constants 
are critical in the level of TIR to perform this.
+
+## Memory Planning
+
+The TIR program has the ability to express both inter and intra operator 
memory requirement, post-scheduling as explained further by [Unified Static 
Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be 
better if the constants could be embedded to the TIR PrimFunc. Moreover, this 
allows various [target-dependent 
lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR 
PrimFuncs with constants in it.
+
+## Winograd Constants
+
+The Winograd transformation (used for fast GEMMs) involves multiplication by a 
hard-coded constant tensor. This is currently accomplished in TE using a 
complicated TE compute expression with many nested selects. Being able to 
directly express a constant tensor here would significantly simplify this code.
+
+
+# 3. Guide-level explanation
+
+This is not particularly a user-facing feature and this will allow constants 
to be 'linked' to TIR. Initially, we are planning to use this with gated on 
'-link-params' argument for relay.build and TVMC.
+
+# 4. Reference-level explanation
+
+The proposal is quite simple and it could be explained as follows :
+
+```
[email protected]
+def myfunc():   
+   param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10])
+```
+
+This follows closely the semantics of tir.allocate and the difference being it 
represent a buffer filled with constants.

Review comment:
       Thanks for answering my questions! It is super helpful for me to 
understand the motivation and scope of this RFC :-)
   
   I like the idea of having `tir.allocate_constant`, which I dreamed of 
before, and of course am not going to be against it. I was just really cautious 
when extending an IR with new nodes, because it usually means breaking most of 
the passes which usually don't handle unknown node types, and this might lead 
to the impact to the overall TVM stack, as well as compilation workflow for 
various different architectures and usecases.
   
   Here are my thoughts (I'm sure I am limited in certain aspects, so please 
feel free to disagree and propose alternatives):
   
   A. To be consistent with the unified IR effort, what about we use the 
IRModule-level attributes that @electriclilies is pushing for recently, and put 
those parameters as IRModule attributes. It could be directly translated into 
the `.rodata` section in LLVM-based codegen, which both CPU target and CUDA 
host target use. We can do some small refactors to make both Relay and TIR 
level tensor constants into using IRModule attributes in a unified approach.
   
   B. `tir.allocate_constant` needs proper support by the TVM script parser and 
printer. Otherwise, the roundtrippable workflow will be broken. In this case, 
we might need to think of supporting parsing tensors (or meta info of tensors 
if the actual parameters are not linked in yet).
   
   C. On architectures other than CPU, like GPUs, my quick thoughts are: at 
least we can lower this into a call that copies chunk of data onto GPUs using 
TVM's C API. Let me know if it makes sense. 
   
   D. If I understand correctly, because the data is generated into the 
`rodata` section, which is loaded directly to memory, so it's not going to be 
on the stack (correct me if I was wrong). Also there is probably zero copy 
mechanism so that we don't need to waste any extra copy - which sounds cool to 
me.
   
   E. On winograd weights, my second thought is that if we are able to identify 
if a buffer is constant in `tir.Load`, and we can actually simplify the 
`tir.Load` into its constant value - which IMO perfectly resolves the question 
I asked above. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to