mbrookhart opened a new pull request, #12889:
URL: https://github.com/apache/tvm/pull/12889
cc @AndrewZhaoLuo @honghuichao
I attempted to support non-constant scales and zero points in FQ2I to fix
the problem in #12707. This works, the graph gets transformed as I'd expect
from this:
```
def @main(%x0: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x1:
Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x2: Tensor[(1, 4), int8]
/* ty=Tensor[(1, 4), int8] */, %x3: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4),
int8] */) -> Tensor[(1, 16), int8] {
%0 = add(0f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%1 = multiply(0 /* ty=int32 */, 1 /* ty=int32 */) /* ty=int32 */;
%2 = add(1f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%3 = add(2f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%4 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%5 = qnn.dequantize(%x0, %0, %1) /* ty=Tensor[(1, 4), float32] */;
%6 = qnn.dequantize(%x1, %2, %1) /* ty=Tensor[(1, 4), float32] */;
%7 = qnn.dequantize(%x2, %3, %1) /* ty=Tensor[(1, 4), float32] */;
%8 = qnn.dequantize(%x3, %4, %1) /* ty=Tensor[(1, 4), float32] */;
%9 = (%5, %6, %7, %8) /* ty=(Tensor[(1, 4), float32], Tensor[(1, 4),
float32], Tensor[(1, 4), float32], Tensor[(1, 4), float32]) */;
%10 = concatenate(%9, axis=1) /* ty=Tensor[(1, 16), float32] */;
%11 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
qnn.quantize(%10, %11, %1, out_dtype="int8") /* ty=Tensor[(1, 16), int8] */
}
```
To this:
```
def @main(%x0: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x1:
Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x2: Tensor[(1, 4), int8]
/* ty=Tensor[(1, 4), int8] */, %x3: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4),
int8] */) -> Tensor[(1, 16), int8] {
%0 = add(0f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%1 = add(1f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%2 = add(2f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%3 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
%4 = multiply(0 /* ty=int32 */, 1 /* ty=int32 */) /* ty=int32 */;
%5 = (%x0, %x1, %x2, %x3) /* ty=(Tensor[(1, 4), int8], Tensor[(1, 4),
int8], Tensor[(1, 4), int8], Tensor[(1, 4), int8]) */;
%6 = (%0, %1, %2, %3) /* ty=(float32, float32, float32, float32) */;
%7 = (%4, %4, %4, %4) /* ty=(int32, int32, int32, int32) */;
%8 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
qnn.concatenate(%5, %6, %7, %8, %4, axis=1) /* ty=Tensor[(1, 16), int8] */
}
```
But the test I wrote fails because requantization doesn't support
non-constant scales and zero points, and the concat operation calls requantize
under the hood. I'm not sure how soon I will get into fixing that issue.
This might not be a problem for the graph originally propose in the issue,
since it seems everything has the same scale/zero point, but it's not a general
solution with the current backend.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]