[GitHub] [incubator-tvm] cbalint13 opened a new pull request #5800: [ONNX] Skip multiply with 1.0f constant for GEMM import

GitBox Sat, 13 Jun 2020 10:32:33 -0700


cbalint13 opened a new pull request #5800:
URL: https://github.com/apache/incubator-tvm/pull/5800



   This small PR optimize the GEMM (nn.dense) import via ONNX. It also makes 
quantization decisions much better.
   
   **Description**
   
   A single ```Gemm``` operator from ONNX expands into a series of 
```transpose```, ```multiply```, ```dense```, ```bias_add``` layers in accord 
with formula: ```Y = alpha * A * B + beta * C```
   
   **Outcome**
   1. This PR eliminates one ```multiply()``` layer in case of ```alpha == 1```
   2. The ommited layer leads much better decisions on final quantization 
```realization``` step.
   
   
   **Intermediate Results**
   
   * Before
   ```
     %15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
     %16 = multiply(%15, 1f /* ty=float32 */) /* ty=Tensor[(1, 800), float32] 
*/;
     %17 = relay.op.annotation.simulated_quantize(%16, 0.0625f /* ty=float32 
*/, -127f /* ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1, 
800), float32] */;
     %18 = nn.dense(%17, meta[relay.Constant][2] /* ty=Tensor[(512, 800), 
float32] */ /* ty=Tensor[(512, 800), float32] */, units=512) /* ty=Tensor[(1, 
512), float32] */;
   ```
   * After
   ```
     %15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
     %16 = relay.op.annotation.simulated_quantize(%15, 0.0625f /* ty=float32 
*/, -127f /* ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1, 
800), float32] */;
     %17 = nn.dense(%16, meta[relay.Constant][2] /* ty=Tensor[(512, 800), 
float32] */ /* ty=Tensor[(512, 800), float32] */, units=512) /* ty=Tensor[(1, 
512), float32] */;
   ```
   **Quantized Results**
   
   * Before
   ```
     %35 = nn.batch_flatten(%34) /* ty=Tensor[(1, 512), int32] */;
     %36 = cast(%35, dtype="float32") /* ty=Tensor[(1, 512), float32] */;
     %37 = multiply(%36, 6.10352e-05f /* ty=float32 */) /* ty=Tensor[(1, 512), 
float32] */;
     %38 = multiply(%37, 1f /* ty=float32 */) /* ty=Tensor[(1, 512), float32] 
*/;
     %39 = multiply(%38, 16f /* ty=float32 */) /* ty=Tensor[(1, 512), float32] 
*/;
     %40 = round(%39) /* ty=Tensor[(1, 512), float32] */;
     %41 = clip(%40, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), float32] 
*/;
   ```
   * After
   ```
     %29 = nn.batch_flatten(%28) /* ty=Tensor[(1, 512), int32] */;
     %30 = add(%29, 512 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %31 = right_shift(%30, 10 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %32 = clip(%31, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), int32] */;
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-tvm] cbalint13 opened a new pull request #5800: [ONNX] Skip multiply with 1.0f constant for GEMM import

Reply via email to