LeiWang1999 opened a new issue, #14100: URL: https://github.com/apache/tvm/issues/14100
Here is the code for TVM's affine map, which is used to determine the extent of loop arguments: https://github.com/apache/tvm/blob/a3b51f11b26d721980858091c2df4c7487388c12/src/arith/iter_affine_map.cc#L1868-L1941 We can see that the extent of the loop arguments can be inferred using the expression extent *= new_arg->extent;. In most cases, this approach works correctly because the loop arguments are usually consecutive. For example: ```python v3 = T.axis.spatial(16, ic_0 * 4 + ax2) # ax2 has extent 4, ic_0 has extent 4 ``` The total extent of the loop is 16 because the maximum value of ic_0 is 3, and the upper bound of 3 * 4 + ax2 is 16. We can also compute the total extent using the DivideIterSumExpr function, which will give us ic_0's extent times ax2's extent, which is 4 * 4 and also equals 16. However, this approach is not always correct because sometimes the scale value of ic_0 is not equal to the extent of ax2, like in the following example: ```python v0 = T.axis.spatial(56, h_w_fused // 54 + kh) ``` Here, the extent of kh is 3, the extent of (h_w_fused // 54) is 54, and after blockize transform, the extent will be computed as 54 * 3 -> 162. However, the correct extent of this expression is (54-1) * 1 + 3). To make the program correct, v0 should be: ```python v0 = T.axis.spatial(56, kh * 54 + h_w_fused // 54) ``` For more information, please see https://discuss.tvm.apache.org/t/extra-loop-scale-when-do-tensorize-with-tir/14200. In my opinion, the correct derivation code should be: ```cpp PrimExpr extent = make_const(dtype, 0); std::vector<IterSplitExpr> outer_args, inner_args; bool inner = true, scale_is_one = false; // we check in inverse order so we can visit from inner to outer for (auto it = expr->args.rbegin(); it != expr->args.rend(); ++it) { const IterSplitExpr& arg = *it; if (is_one(arg->scale)) scale_is_one = true; DivisionResult arg_division = DivideIterSplitExpr(arg); IterSplitExpr new_arg; if (arg_division.IsInner()) { // LOG(INFO) << "arg_division is inner"; if (!inner) { unresolved_count_++; return DivisionResult::Failure(); } new_arg = arg_division.GetInnerAsSplit(); inner_args.push_back(new_arg); inner = true; } else if (arg_division.IsOuter()) { // LOG(INFO) << "arg_division is outer"; new_arg = arg_division.GetOuterAsSplit(); outer_args.push_back(new_arg); inner = false; } else { unresolved_count_++; return DivisionResult::Failure(); } LOG(INFO) << "args.extent " << new_arg->extent << " args.scale " << new_arg->scale; extent += ((new_arg->extent - 1) * new_arg->scale); } extent += 1; ``` After making the changes I have mentioned, my code started working correctly. It is possible that the original approach to determining the extent of loop arguments was incorrect and resulted in a bug. However, it is also possible that the original approach was correct in most cases but failed in certain edge cases, such as the one I encountered, Any discussion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
