vvchernov opened a new issue, #13467: URL: https://github.com/apache/tvm/issues/13467
### Expected behavior Support of variable zero point in qnn batch_matmul ### Actual behavior Quantized [model](https://huggingface.co/philschmid/quantized-distilbert-banking77?text=I+like+you.+I+love+you) from Hugging face failed during compilation due to variable zero point used there. To observe this problem the input types matching line from [issue](https://github.com/apache/tvm/issues/13466) should be commented. Part of error logs: > 1: tvm::relay::qnn::QnnBatchMatmulCanonicalize(tvm::Attrs const&, tvm::runtime::Array<tvm::RelayExpr, void> const&, tvm::runtime::Array<tvm::Type, void> const&) > at /home/user/Workshop/tvm/src/relay/qnn/op/batch_matmul.cc:179 > 0: int tvm::relay::GetScalarFromConstant<int>(tvm::RelayExpr) > at /home/user/Workshop/tvm/src/relay/qnn/op/../../op/nn/../../transforms/pattern_utils.h:641 > File "/home/user/Workshop/tvm/src/relay/transforms/./pattern_utils.h", line 641 > TVMError: > --------------------------------------------------------------- > An error occurred during the execution of TVM. > For more information, please see: https://tvm.apache.org/docs/errors.html > --------------------------------------------------------------- > Check failed: (n) is false: Expr must be a constant expr - #[version = "0.0.5"] > free_var %input_ids: Tensor[(1, 54), int64] /* ty=Tensor[(1, 54), int64] */; > %0 = less(%input_ids, 0i64 /* ty=int64 */) /* ty=Tensor[(1, 54), bool] */; > %1 = add(%input_ids, 30522i64 /* ty=int64 */) /* ty=Tensor[(1, 54), int64] */; > %2 = where(%0, %1, %input_ids) /* ty=Tensor[(1, 54), int64] */; > %3 = take(meta[relay.Constant][1] /* ty=Tensor[(30522, 768), float32] */, %2, axis=0) /* ty=Tensor[(1, 54, 768), float32] */; > %4 = add(%3, meta[relay.Constant][2] /* ty=Tensor[(1, 54, 768), float32] */) /* ty=Tensor[(1, 54, 768), float32] */; > %5 = mean(%4, axis=[-1], keepdims=True) /* ty=Tensor[(1, 54, 1), float32] */; > %6 = subtract(%4, %5) /* ty=Tensor[(1, 54, 768), float32] */; > %7 = power(%6, 2f /* ty=float32 */) /* ty=Tensor[(1, 54, 768), float32] */; > %8 = mean(%7, axis=[-1], keepdims=True) /* ty=Tensor[(1, 54, 1), float32] */; > %9 = add(%8, 1e-12f /* ty=float32 */) /* ty=Tensor[(1, 54, 1), float32] */; > %10 = sqrt(%9) /* ty=Tensor[(1, 54, 1), float32] */; > %11 = divide(%6, %10) /* ty=Tensor[(1, 54, 768), float32] */; > %12 = multiply(%11, meta[relay.Constant][3] /* ty=Tensor[(768), float32] */) /* ty=Tensor[(1, 54, 768), float32] */; > %13 = add(%12, meta[relay.Constant][4] /* ty=Tensor[(768), float32] */) /* ty=Tensor[(1, 54, 768), float32] */; > %14 = max(%13) /* ty=float32 */; > %15 = min(%13) /* ty=float32 */; > %16 = maximum(0f /* ty=float32 */, %14) /* ty=float32 */; > %17 = minimum(0f /* ty=float32 */, %15) /* ty=float32 */; > %18 = subtract(%16, %17) /* ty=float32 */; > %19 = divide(%18, 255f /* ty=float32 */) /* ty=float32 */; > %20 = divide(%13, %19); > %21 = min(%13) /* ty=float32 */; > %22 = divide(%21, %19) /* ty=float32 */; > %23 = subtract(0f /* ty=float32 */, %22) /* ty=float32 */; > %24 = clip(%23, a_min=0f, a_max=255f) /* ty=float32 */; > %25 = round(%24) /* ty=float32 */; > %26 = cast(%25, dtype="uint8") /* ty=uint8 */; > %27 = cast(%26, dtype="int32") /* ty=int32 */; > %28 = round(%20); > %29 = cast(%27, dtype="float32"); > %30 = add(%28, %29); > %31 = clip(%30, a_min=0f, a_max=255f); > %32 = cast(%31, dtype="uint8"); > %33 = reshape(%32, newshape=[-1, 768]) /* ty=Tensor[(54, 768), uint8] */; > %34 = cast(%33, dtype="int32"); > %35 = sum(%34, axis=[1], keepdims=True); > %36 = nn.dense(%33, meta[relay.Constant][5] /* ty=Tensor[(768, 768), int8] */, units=768, out_dtype="int32"); > %37 = multiply(0 /* ty=int32 */, %35); > %38 = cast(%26, dtype="int32") /* ty=int32 */; > %39 = multiply(%38, 0 /* ty=int32 */); > %40 = cast(meta[relay.Constant][5] /* ty=Tensor[(768, 768), int8] */, dtype="int32"); > %41 = sum(%40, axis=[1]); > %42 = multiply(%39, 768); > %43 = multiply(%38, %41); > %44 = subtract(%36, %37); > %45 = subtract(%42, %43); > %46 = add(%44, %45); > %47 = reshape(%46, newshape=[1, 54, 768]) /* ty=Tensor[(1, 54, 768), int32] */; > %48 = cast(%47, dtype="float32") /* ty=Tensor[(1, 54, 768), float32] */; > %49 = multiply(%19, 0.00736962f /* ty=float32 */) /* ty=float32 */; > %50 = multiply(%48, %49) /* ty=Tensor[(1, 54, 768), float32] */; > %51 = add(meta[relay.Constant][0] /* ty=Tensor[(768), float32] */, %50) /* ty=Tensor[(1, 54, 768), float32] */; > %52 = reshape(%51, newshape=[1, -1, 12, 64]) /* ty=Tensor[(1, 54, 12, 64), float32] */; > %53 = transpose(%52, axes=[0, 2, 3, 1]) /* ty=Tensor[(1, 12, 64, 54), float32] */; > %54 = max(%53) /* ty=float32 */; > %55 = min(%53) /* ty=float32 */; > %56 = maximum(0f /* ty=float32 */, %54) /* ty=float32 */; > %57 = minimum(0f /* ty=float32 */, %55) /* ty=float32 */; > %58 = subtract(%56, %57) /* ty=float32 */; > %59 = min(%53) /* ty=float32 */; > %60 = divide(%58, 255f /* ty=float32 */) /* ty=float32 */; > %61 = divide(%59, %60) /* ty=float32 */; > %62 = subtract(0f /* ty=float32 */, %61) /* ty=float32 */; > %63 = clip(%62, a_min=0f, a_max=255f) /* ty=float32 */; > %64 = round(%63) /* ty=float32 */; > %65 = cast(%64, dtype="uint8") /* ty=uint8 */; > cast(%65, dtype="int32") /* ty=int32 */ ### Environment Linux 20.04 LTE ### Steps to reproduce Usual step for compilation and launching of the onnx-model by VirtualMachine by python front-end ### Triage * frontend:onnx * relay:qnn ### Notes There are several options for format of scales and zero points: a. scalar or tensor, b. constant or variable. Now constant scalar is supported for qnn batch matmul only. It looks like there is no reasonable constraints on zero point format for any qnn operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
