All of above `qnn` ops will be lowered to existing Relay primitive ops using
some Relay pass (for example, using ForwardRewrite infra). For example -
`relay.op.qnn.conv2d` can be lowered to
~~~
fn (%quantized_data: Tensor[(2, 1, 2, 4), uint8], %weight: Tensor[(3, 1, 2, 2),
uint8]) -> Tensor[(2, 3, 1, 3), uint8] {
%0 = nn.conv2d(%quantized_data, %weight, kernel_size=[2, 2],
out_dtype="int32")
%1 = cast(%0, dtype="float32")
%2 = multiply(%1, 0.25098f)
%3 = round(%2)
%4 = cast(%3, dtype="int32")
%5 = clip(%4, a_min=0, a_max=255)
cast(%5, dtype="uint8")
}
~~~
---------------------
I have yet to understand what needs to be done with softmax. Will have to look
at a quantized model to understand.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-507461088