[GitHub] [tvm] vvchernov opened a new issue, #13467: [ONNX][Relay][QNN] tvm doesn't support variable zero point for qnn batch_matmul

GitBox Tue, 22 Nov 2022 02:30:41 -0800


vvchernov opened a new issue, #13467:
URL: https://github.com/apache/tvm/issues/13467


   ### Expected behavior
   Support of variable zero point in qnn batch_matmul
   
   ### Actual behavior
   Quantized 
[model](https://huggingface.co/philschmid/quantized-distilbert-banking77?text=I+like+you.+I+love+you)
 from Hugging face failed during compilation due to variable zero point used 
there. To observe this problem the input types matching line from 
[issue](https://github.com/apache/tvm/issues/13466) should be commented.
   
   Part of error logs:
   
   >   1: tvm::relay::qnn::QnnBatchMatmulCanonicalize(tvm::Attrs const&, 
tvm::runtime::Array<tvm::RelayExpr, void> const&, 
tvm::runtime::Array<tvm::Type, void> const&)
   >         at /home/user/Workshop/tvm/src/relay/qnn/op/batch_matmul.cc:179
   >   0: int tvm::relay::GetScalarFromConstant<int>(tvm::RelayExpr)
   >         at 
/home/user/Workshop/tvm/src/relay/qnn/op/../../op/nn/../../transforms/pattern_utils.h:641
   >   File "/home/user/Workshop/tvm/src/relay/transforms/./pattern_utils.h", 
line 641
   > TVMError:
   > ---------------------------------------------------------------
   > An error occurred during the execution of TVM.
   > For more information, please see: https://tvm.apache.org/docs/errors.html
   > ---------------------------------------------------------------
   >   Check failed: (n) is false: Expr must be a constant expr - #[version = 
"0.0.5"]
   > free_var %input_ids: Tensor[(1, 54), int64] /* ty=Tensor[(1, 54), int64] 
*/;
   > %0 = less(%input_ids, 0i64 /* ty=int64 */) /* ty=Tensor[(1, 54), bool] */;
   > %1 = add(%input_ids, 30522i64 /* ty=int64 */) /* ty=Tensor[(1, 54), int64] 
*/;
   > %2 = where(%0, %1, %input_ids) /* ty=Tensor[(1, 54), int64] */;
   > %3 = take(meta[relay.Constant][1] /* ty=Tensor[(30522, 768), float32] */, 
%2, axis=0) /* ty=Tensor[(1, 54, 768), float32] */;
   > %4 = add(%3, meta[relay.Constant][2] /* ty=Tensor[(1, 54, 768), float32] 
*/) /* ty=Tensor[(1, 54, 768), float32] */;
   > %5 = mean(%4, axis=[-1], keepdims=True) /* ty=Tensor[(1, 54, 1), float32] 
*/;
   > %6 = subtract(%4, %5) /* ty=Tensor[(1, 54, 768), float32] */;
   > %7 = power(%6, 2f /* ty=float32 */) /* ty=Tensor[(1, 54, 768), float32] */;
   > %8 = mean(%7, axis=[-1], keepdims=True) /* ty=Tensor[(1, 54, 1), float32] 
*/;
   > %9 = add(%8, 1e-12f /* ty=float32 */) /* ty=Tensor[(1, 54, 1), float32] */;
   > %10 = sqrt(%9) /* ty=Tensor[(1, 54, 1), float32] */;
   > %11 = divide(%6, %10) /* ty=Tensor[(1, 54, 768), float32] */;
   > %12 = multiply(%11, meta[relay.Constant][3] /* ty=Tensor[(768), float32] 
*/) /* ty=Tensor[(1, 54, 768), float32] */;
   > %13 = add(%12, meta[relay.Constant][4] /* ty=Tensor[(768), float32] */) /* 
ty=Tensor[(1, 54, 768), float32] */;
   > %14 = max(%13) /* ty=float32 */;
   > %15 = min(%13) /* ty=float32 */;
   > %16 = maximum(0f /* ty=float32 */, %14) /* ty=float32 */;
   > %17 = minimum(0f /* ty=float32 */, %15) /* ty=float32 */;
   > %18 = subtract(%16, %17) /* ty=float32 */;
   > %19 = divide(%18, 255f /* ty=float32 */) /* ty=float32 */;
   > %20 = divide(%13, %19);
   > %21 = min(%13) /* ty=float32 */;
   > %22 = divide(%21, %19) /* ty=float32 */;
   > %23 = subtract(0f /* ty=float32 */, %22) /* ty=float32 */;
   > %24 = clip(%23, a_min=0f, a_max=255f) /* ty=float32 */;
   > %25 = round(%24) /* ty=float32 */;
   > %26 = cast(%25, dtype="uint8") /* ty=uint8 */;
   > %27 = cast(%26, dtype="int32") /* ty=int32 */;
   > %28 = round(%20);
   > %29 = cast(%27, dtype="float32");
   > %30 = add(%28, %29);
   > %31 = clip(%30, a_min=0f, a_max=255f);
   > %32 = cast(%31, dtype="uint8");
   > %33 = reshape(%32, newshape=[-1, 768]) /* ty=Tensor[(54, 768), uint8] */;
   > %34 = cast(%33, dtype="int32");
   > %35 = sum(%34, axis=[1], keepdims=True);
   > %36 = nn.dense(%33, meta[relay.Constant][5] /* ty=Tensor[(768, 768), int8] 
*/, units=768, out_dtype="int32");
   > %37 = multiply(0 /* ty=int32 */, %35);
   > %38 = cast(%26, dtype="int32") /* ty=int32 */;
   > %39 = multiply(%38, 0 /* ty=int32 */);
   > %40 = cast(meta[relay.Constant][5] /* ty=Tensor[(768, 768), int8] */, 
dtype="int32");
   > %41 = sum(%40, axis=[1]);
   > %42 = multiply(%39, 768);
   > %43 = multiply(%38, %41);
   > %44 = subtract(%36, %37);
   > %45 = subtract(%42, %43);
   > %46 = add(%44, %45);
   > %47 = reshape(%46, newshape=[1, 54, 768]) /* ty=Tensor[(1, 54, 768), 
int32] */;
   > %48 = cast(%47, dtype="float32") /* ty=Tensor[(1, 54, 768), float32] */;
   > %49 = multiply(%19, 0.00736962f /* ty=float32 */) /* ty=float32 */;
   > %50 = multiply(%48, %49) /* ty=Tensor[(1, 54, 768), float32] */;
   > %51 = add(meta[relay.Constant][0] /* ty=Tensor[(768), float32] */, %50) /* 
ty=Tensor[(1, 54, 768), float32] */;
   > %52 = reshape(%51, newshape=[1, -1, 12, 64]) /* ty=Tensor[(1, 54, 12, 64), 
float32] */;
   > %53 = transpose(%52, axes=[0, 2, 3, 1]) /* ty=Tensor[(1, 12, 64, 54), 
float32] */;
   > %54 = max(%53) /* ty=float32 */;
   > %55 = min(%53) /* ty=float32 */;
   > %56 = maximum(0f /* ty=float32 */, %54) /* ty=float32 */;
   > %57 = minimum(0f /* ty=float32 */, %55) /* ty=float32 */;
   > %58 = subtract(%56, %57) /* ty=float32 */;
   > %59 = min(%53) /* ty=float32 */;
   > %60 = divide(%58, 255f /* ty=float32 */) /* ty=float32 */;
   > %61 = divide(%59, %60) /* ty=float32 */;
   > %62 = subtract(0f /* ty=float32 */, %61) /* ty=float32 */;
   > %63 = clip(%62, a_min=0f, a_max=255f) /* ty=float32 */;
   > %64 = round(%63) /* ty=float32 */;
   > %65 = cast(%64, dtype="uint8") /* ty=uint8 */;
   > cast(%65, dtype="int32") /* ty=int32 */
   
   ### Environment
   Linux 20.04 LTE
   
   ### Steps to reproduce
   Usual step for compilation and launching of the onnx-model by VirtualMachine 
by python front-end
   
   ### Triage
   * frontend:onnx
   * relay:qnn
   
   ### Notes
   There are several options for format of scales and zero points: a. scalar or 
tensor, b. constant or variable. Now constant scalar is supported for qnn batch 
matmul only. It looks like there is no reasonable constraints on zero point 
format for any qnn operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] vvchernov opened a new issue, #13467: [ONNX][Relay][QNN] tvm doesn't support variable zero point for qnn batch_matmul

Reply via email to