LeiWang1999 opened a new issue, #14100:
URL: https://github.com/apache/tvm/issues/14100

   Here is the code for TVM's affine map, which is used to determine the extent 
of loop arguments:
   
   
https://github.com/apache/tvm/blob/a3b51f11b26d721980858091c2df4c7487388c12/src/arith/iter_affine_map.cc#L1868-L1941
   
   We can see that the extent of the loop arguments can be inferred using the 
expression extent *= new_arg->extent;. In most cases, this approach works 
correctly because the loop arguments are usually consecutive. For example:
   
   ```python
   v3 = T.axis.spatial(16, ic_0 * 4 + ax2)
   # ax2 has extent 4, ic_0 has extent 4
   ```
   
   The total extent of the loop is 16 because the maximum value of ic_0 is 3, 
and the upper bound of 3 * 4 + ax2 is 16. We can also compute the total extent 
using the DivideIterSumExpr function, which will give us ic_0's extent times 
ax2's extent, which is 4 * 4 and also equals 16.
   
   However, this approach is not always correct because sometimes the scale 
value of ic_0 is not equal to the extent of ax2, like in the following example:
   
   ```python
   v0 = T.axis.spatial(56, h_w_fused // 54 + kh)
   ```
   
   Here, the extent of kh is 3, the extent of (h_w_fused // 54) is 54, and 
after blockize transform, the extent will be computed as 54 * 3 -> 162. 
However, the correct extent of this expression is (54-1) * 1 + 3). To make the 
program correct, v0 should be:
   
   ```python
   v0 = T.axis.spatial(56, kh * 54 + h_w_fused // 54)
   ```
   
   For more information, please see 
https://discuss.tvm.apache.org/t/extra-loop-scale-when-do-tensorize-with-tir/14200.
   
   In my opinion, the correct derivation code should be:
   
   ```cpp
       PrimExpr extent = make_const(dtype, 0);
       std::vector<IterSplitExpr> outer_args, inner_args;
       bool inner = true, scale_is_one = false;
       // we check in inverse order so we can visit from inner to outer
       for (auto it = expr->args.rbegin(); it != expr->args.rend(); ++it) {
         const IterSplitExpr& arg = *it;
         if (is_one(arg->scale)) scale_is_one = true;
         DivisionResult arg_division = DivideIterSplitExpr(arg);
         IterSplitExpr new_arg;
         if (arg_division.IsInner()) {
           // LOG(INFO) << "arg_division is inner";
           if (!inner) {
             unresolved_count_++;
             return DivisionResult::Failure();
           }
           new_arg = arg_division.GetInnerAsSplit();
           inner_args.push_back(new_arg);
           inner = true;
         } else if (arg_division.IsOuter()) {
           // LOG(INFO) << "arg_division is outer";
           new_arg = arg_division.GetOuterAsSplit();
           outer_args.push_back(new_arg);
           inner = false;
         } else {
           unresolved_count_++;
           return DivisionResult::Failure();
         }
         LOG(INFO) << "args.extent " << new_arg->extent << " args.scale " << 
new_arg->scale;
         extent += ((new_arg->extent - 1) * new_arg->scale);
       }
       extent += 1;
   ```
   
   After making the changes I have mentioned, my code started working 
correctly. It is possible that the original approach to determining the extent 
of loop arguments was incorrect and resulted in a bug. However, it is also 
possible that the original approach was correct in most cases but failed in 
certain edge cases, such as the one I encountered, 
   
   Any discussion?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to