cbalint13 opened a new pull request #5800:
URL: https://github.com/apache/incubator-tvm/pull/5800
This small PR optimize the GEMM (nn.dense) import via ONNX. It also makes
quantization decisions much better.
**Description**
A single ```Gemm``` operator from ONNX expands into a series of
```transpose```, ```multiply```, ```dense```, ```bias_add``` layers in accord
with formula: ```Y = alpha * A * B + beta * C```
**Outcome**
1. This PR eliminates one ```multiply()``` layer in case of ```alpha == 1```
2. The ommited layer leads much better decisions on final quantization
```realization``` step.
**Intermediate Results**
* Before
```
%15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
%16 = multiply(%15, 1f /* ty=float32 */) /* ty=Tensor[(1, 800), float32]
*/;
%17 = relay.op.annotation.simulated_quantize(%16, 0.0625f /* ty=float32
*/, -127f /* ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1,
800), float32] */;
%18 = nn.dense(%17, meta[relay.Constant][2] /* ty=Tensor[(512, 800),
float32] */ /* ty=Tensor[(512, 800), float32] */, units=512) /* ty=Tensor[(1,
512), float32] */;
```
* After
```
%15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
%16 = relay.op.annotation.simulated_quantize(%15, 0.0625f /* ty=float32
*/, -127f /* ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1,
800), float32] */;
%17 = nn.dense(%16, meta[relay.Constant][2] /* ty=Tensor[(512, 800),
float32] */ /* ty=Tensor[(512, 800), float32] */, units=512) /* ty=Tensor[(1,
512), float32] */;
```
**Quantized Results**
* Before
```
%35 = nn.batch_flatten(%34) /* ty=Tensor[(1, 512), int32] */;
%36 = cast(%35, dtype="float32") /* ty=Tensor[(1, 512), float32] */;
%37 = multiply(%36, 6.10352e-05f /* ty=float32 */) /* ty=Tensor[(1, 512),
float32] */;
%38 = multiply(%37, 1f /* ty=float32 */) /* ty=Tensor[(1, 512), float32]
*/;
%39 = multiply(%38, 16f /* ty=float32 */) /* ty=Tensor[(1, 512), float32]
*/;
%40 = round(%39) /* ty=Tensor[(1, 512), float32] */;
%41 = clip(%40, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), float32]
*/;
```
* After
```
%29 = nn.batch_flatten(%28) /* ty=Tensor[(1, 512), int32] */;
%30 = add(%29, 512 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
%31 = right_shift(%30, 10 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
%32 = clip(%31, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), int32] */;
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]