cbalint13 opened a new pull request #5800:
URL: https://github.com/apache/incubator-tvm/pull/5800


   This small PR optimize the GEMM (nn.dense) import via ONNX. It also makes 
quantization decisions much better.
   
   **Description**
   
   A single ```Gemm``` operator from ONNX expands into a series of 
```transpose```, ```multiply```, ```dense```, ```bias_add``` layers in accord 
with formula: ```Y = alpha * A * B + beta * C```
   
   **Outcome**
   1. This PR eliminates one ```multiply()``` layer in case of ```alpha == 1```
   2. The ommited layer leads much better decisions on final quantization 
```realization``` step.
   
   
   **Intermediate Results**
   
   * Before
   ```
     %15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
     %16 = multiply(%15, 1f /* ty=float32 */) /* ty=Tensor[(1, 800), float32] 
*/;
     %17 = relay.op.annotation.simulated_quantize(%16, 0.0625f /* ty=float32 
*/, -127f /* ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1, 
800), float32] */;
     %18 = nn.dense(%17, meta[relay.Constant][2] /* ty=Tensor[(512, 800), 
float32] */ /* ty=Tensor[(512, 800), float32] */, units=512) /* ty=Tensor[(1, 
512), float32] */;
   ```
   * After
   ```
     %15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
     %16 = relay.op.annotation.simulated_quantize(%15, 0.0625f /* ty=float32 
*/, -127f /* ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1, 
800), float32] */;
     %17 = nn.dense(%16, meta[relay.Constant][2] /* ty=Tensor[(512, 800), 
float32] */ /* ty=Tensor[(512, 800), float32] */, units=512) /* ty=Tensor[(1, 
512), float32] */;
   ```
   **Quantized Results**
   
   * Before
   ```
     %35 = nn.batch_flatten(%34) /* ty=Tensor[(1, 512), int32] */;
     %36 = cast(%35, dtype="float32") /* ty=Tensor[(1, 512), float32] */;
     %37 = multiply(%36, 6.10352e-05f /* ty=float32 */) /* ty=Tensor[(1, 512), 
float32] */;
     %38 = multiply(%37, 1f /* ty=float32 */) /* ty=Tensor[(1, 512), float32] 
*/;
     %39 = multiply(%38, 16f /* ty=float32 */) /* ty=Tensor[(1, 512), float32] 
*/;
     %40 = round(%39) /* ty=Tensor[(1, 512), float32] */;
     %41 = clip(%40, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), float32] 
*/;
   ```
   * After
   ```
     %29 = nn.batch_flatten(%28) /* ty=Tensor[(1, 512), int32] */;
     %30 = add(%29, 512 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %31 = right_shift(%30, 10 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %32 = clip(%31, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), int32] */;
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to