[GitHub] [tvm] tqchen commented on a change in pull request #9727: [TE][TIR] Implement layout transformations, non-flat memory buffers

GitBox Thu, 03 Mar 2022 09:43:20 -0800


tqchen commented on a change in pull request #9727:
URL: https://github.com/apache/tvm/pull/9727#discussion_r818906560




##########
File path: include/tvm/tir/builtin.h
##########
@@ -105,10 +105,10 @@ TVM_DLL const Op& large_uint_imm();
 TVM_DLL const Op& q_multiply_shift();
 
 /*!
- * \brief See pesudo code
+ * \brief See pseudo code
  *
- *  Handle address_of(Load *op) {
- *     return &op->buffer_var[index];

Review comment:
       Thanks @Lunderberg To elaborate a bit further, `address_of` is a very 
low-level primitive and appears at a later stage of the buffer. 
   
   I believe what we are trying to say instead is that "there is a need to 
build a mechanism to take possible buffer subregion's head address so we can 
plug them into intrinsics"
   
   While we would find need of taking address of multiple dimensional buffers, 
the specific design of the instrinsic may not be very desirable at early stage 
of scheduling, especially when it comes to the case when buffer layout and 
overall allocation regions changes.
   
   Consider the following example:
   
   ```python
   @T.prim_func
   def myfunc(A: T.Buffer([4,4], "float32"), C: T.Buffer([4,4], "float32")):
        B = T.alloc_buffer([4,4], "float32")
        for i, j in T.grid(2, 2):
              load_2x2_matrix(address_of(B[i*2, j*2], A, i*2, j*2)
              exp_2x2_matrix(address_of(B[i*2, j*2])
              store_2x2_matrix(address_of(B[i*2, j*2], C, i*2, j*2)
   ```
   In this case, the original intention was to load a 2x2 region onto B, 
perform some arithmetic operations, then store it back to C.  The main problem 
of this program is how does the address changes as we start to transform the 
program, say if we swap i and j axis of B, or in this case, what if we compact 
the layout of B to make it 2x2(as only 2x2 region is touched in each of the 
loop iterator).
   
   Because `address_of` is a very low level construct, it is hard to clarify 
what does it mean to update the IR so that the address_of still makes sense. 
Additionally, there are also some unspoken cosntraints, for example, in many 
intrinsics there is a restriction that the submatrix have to stay continguous 
in memory. Which means B have to follow layout `lambda i, j: i//2, j // 2, i 
%2, j%2` to be lowered.
   
   The high level message is that `address_of` alone is not enough to capture 
the **intention** of slicing a submatrix with possible addressing constraints. 
TIR introduced a specific construct (buffer subregion match) to address this 
problem. Which matches a subregion to a new buffer, and the buffer specified 
all the possible constraints that backend might want to impose.
   
   Process-wise, restricting the semantics to match the existing behavior would 
meet our need, but also not open up doors for possible future problems. It 
could be possible that after some deliberation we still think that `address_of` 
for multi-dimensional access is needed, then we can update and generalize at 
that time point
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] tqchen commented on a change in pull request #9727: [TE][TIR] Implement layout transformations, non-flat memory buffers

Reply via email to