Johnson9009 commented on issue #8432: URL: https://github.com/apache/tvm/issues/8432#issuecomment-878353106
After one day debugging, I find the root cause of this issue, it is the below "if" statement. https://github.com/apache/tvm/blob/1d7a9e9dff1983cd4b12104f93b32bfe4b4c4f7e/src/relay/transforms/type_infer.cc#L557-L559 With the help of git history, it is added by the PR #2437. Just use the above test case to describe why the issue will happen, before the 1st invocation of "InferType" pass, the types of function are something like below. ``` def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> IncompleteTypeNode(0, 0xYYYYYY) { %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=IncompleteTypeNode(0, 0xYYYYYY) */; nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) */ } ``` The return type of the function is not defined, so `f->ret_type.defined()` will be evaluated to "false", so everything is good, after the 1st invocation of "InferType" pass, the types of function are something like below. ``` def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] { %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=Tensor[(1, 230, 230, 3), int8] */; nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 224, 224, 64), int32] */ } ``` After my pass "SimplifyPad", the "nn.pad" operator is removed, and then the 2nd invocation of pass "InferType" will happen, the types of function at this time are something like below. ``` def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] { nn.conv2d(%x1, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) */ } ``` The key difference at this time is the return type of the function, it is defined and is "Tensor[(1, 224, 224, 64), int32]", because the last expression of this function is "nn.conv2d", so the return type of "nn.conv2d" is the return type of the function. With the code of line 557~559, the return type of "nn.conv2d" is changed from "IncompleteTypeNode(0, 0xYYYYYY)" to "Tensor[(1, 224, 224, 64), int32]" too. Then the function "Conv2DRel" is called to infer the return type of this "nn.conv2d", but the item of its parameter "types" is "Tensor[(1, 224, 224, 64), int32]" instead of "IncompleteTypeNode(0, 0xYYYYYY)", the type inference logic of the function "Conv2DRel" give out the return type of "nn.conv2d" should be "Tensor[(1, 218, 218, 64), int32]", then the logic of function "tvm::relay::TypeSolver::Unifier::Unify" think the return type of "nn.conv2d" is infered to be 2 different one, so it report the error message. After analyzing this issue will happen as long as your pass will change the shape of return type of Relay function, in another word, if your pass will not change the final shape of return type of Relay function, the issue will not be triggered. @slyubomirsky @jroesch @tqchen I don't know whether we can just remove the "if" statement of L557-L559 to fix this issue, what's your opinions? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
