mbrookhart opened a new pull request, #12889:
URL: https://github.com/apache/tvm/pull/12889

   cc @AndrewZhaoLuo @honghuichao
   
   I attempted to support non-constant scales and zero points in FQ2I to fix 
the problem in #12707. This works, the graph gets transformed as I'd expect 
from this:
   ```
   def @main(%x0: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x1: 
Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x2: Tensor[(1, 4), int8] 
/* ty=Tensor[(1, 4), int8] */, %x3: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), 
int8] */) -> Tensor[(1, 16), int8] {
     %0 = add(0f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %1 = multiply(0 /* ty=int32 */, 1 /* ty=int32 */) /* ty=int32 */;
     %2 = add(1f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %3 = add(2f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %4 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %5 = qnn.dequantize(%x0, %0, %1) /* ty=Tensor[(1, 4), float32] */;
     %6 = qnn.dequantize(%x1, %2, %1) /* ty=Tensor[(1, 4), float32] */;
     %7 = qnn.dequantize(%x2, %3, %1) /* ty=Tensor[(1, 4), float32] */;
     %8 = qnn.dequantize(%x3, %4, %1) /* ty=Tensor[(1, 4), float32] */;
     %9 = (%5, %6, %7, %8) /* ty=(Tensor[(1, 4), float32], Tensor[(1, 4), 
float32], Tensor[(1, 4), float32], Tensor[(1, 4), float32]) */;
     %10 = concatenate(%9, axis=1) /* ty=Tensor[(1, 16), float32] */;
     %11 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     qnn.quantize(%10, %11, %1, out_dtype="int8") /* ty=Tensor[(1, 16), int8] */
   }
   ```
   
   To this:
   
   ```
   def @main(%x0: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x1: 
Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), int8] */, %x2: Tensor[(1, 4), int8] 
/* ty=Tensor[(1, 4), int8] */, %x3: Tensor[(1, 4), int8] /* ty=Tensor[(1, 4), 
int8] */) -> Tensor[(1, 16), int8] {
     %0 = add(0f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %1 = add(1f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %2 = add(2f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %3 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     %4 = multiply(0 /* ty=int32 */, 1 /* ty=int32 */) /* ty=int32 */;
     %5 = (%x0, %x1, %x2, %x3) /* ty=(Tensor[(1, 4), int8], Tensor[(1, 4), 
int8], Tensor[(1, 4), int8], Tensor[(1, 4), int8]) */;
     %6 = (%0, %1, %2, %3) /* ty=(float32, float32, float32, float32) */;
     %7 = (%4, %4, %4, %4) /* ty=(int32, int32, int32, int32) */;
     %8 = add(3f /* ty=float32 */, 0.5f /* ty=float32 */) /* ty=float32 */;
     qnn.concatenate(%5, %6, %7, %8, %4, axis=1) /* ty=Tensor[(1, 16), int8] */
   }
   ```
   
   But the test I wrote fails because requantization doesn't support 
non-constant scales and zero points, and the concat operation calls requantize 
under the hood. I'm not sure how soon I will get into fixing that issue.
   
   This might not be a problem for the graph originally propose in the issue, 
since it seems everything has the same scale/zero point, but it's not a general 
solution with the current backend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to