Johnson9009 commented on issue #8432:
URL: https://github.com/apache/tvm/issues/8432#issuecomment-878353106


   After one day debugging, I find the root cause of this issue, it is the 
below "if" statement.
   
https://github.com/apache/tvm/blob/1d7a9e9dff1983cd4b12104f93b32bfe4b4c4f7e/src/relay/transforms/type_infer.cc#L557-L559
   With the help of git history, it is added by the PR #2437.
   
   Just use the above test case to describe why the issue will happen, before 
the 1st invocation of "InferType" pass, the types of function are something 
like below.
   ```
   def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), 
int8]) -> IncompleteTypeNode(0, 0xYYYYYY) {
     %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 
0]]) /* ty=IncompleteTypeNode(0, 0xYYYYYY) */;
     nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", 
kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) 
*/
   }
   ```
   The return type of the function is not defined, so `f->ret_type.defined()` 
will be evaluated to "false", so everything is good, after the 1st invocation 
of "InferType" pass, the types of function are something like below.
   ```
   def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), 
int8]) -> Tensor[(1, 224, 224, 64), int32] {
     %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 
0]]) /* ty=Tensor[(1, 230, 230, 3), int8] */;
     nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", 
kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 224, 224, 64), int32] 
*/
   }
   ```
   After my pass "SimplifyPad", the "nn.pad" operator is removed, and then the 
2nd invocation of pass "InferType" will happen, the types of function at this 
time are something like below.
   ```
   def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), 
int8]) -> Tensor[(1, 224, 224, 64), int32] {
     nn.conv2d(%x1, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", 
kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) 
*/
   }
   ```
   The key difference at this time is the return type of the function, it is 
defined and is "Tensor[(1, 224, 224, 64), int32]", because the last expression 
of this function is "nn.conv2d", so the return type of "nn.conv2d" is the 
return type of the function. With the code of line 557~559, the return type of 
"nn.conv2d" is changed from "IncompleteTypeNode(0, 0xYYYYYY)" to "Tensor[(1, 
224, 224, 64), int32]" too.
   
   Then the function "Conv2DRel" is called to infer the return type of this 
"nn.conv2d", but the item of its parameter "types" is "Tensor[(1, 224, 224, 
64), int32]" instead of "IncompleteTypeNode(0, 0xYYYYYY)", the type inference 
logic of the function "Conv2DRel" give out the return type of "nn.conv2d" 
should be "Tensor[(1, 218, 218, 64), int32]", then the logic of function 
"tvm::relay::TypeSolver::Unifier::Unify" think the return type of "nn.conv2d" 
is infered to be 2 different one, so it  report the error message.
   
   After analyzing this issue will happen as long as your pass will change the 
shape of return type of Relay function, in another word, if your pass will not 
change the final shape of return type of Relay function, the issue will not be 
triggered.
   
   @slyubomirsky @jroesch @tqchen I don't know whether we can just remove the 
"if" statement of L557-L559 to fix this issue, what's your opinions?
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to