[GitHub] [incubator-tvm] anijain2305 opened a new pull request #6115: [Topi, x86] Using MKL blas for quantized dense

GitBox Wed, 22 Jul 2020 13:02:15 -0700


anijain2305 opened a new pull request #6115:
URL: https://github.com/apache/incubator-tvm/pull/6115



   Using MKL for quantized dense, following the MKL fallback for FP32 dense.
   
   On C5.12x large cascade lake with VNNI support, results for BERT base are as 
follows (latency in ms)
   
   Type | Batch size | MXNet+MKL | TVM+MKL
   -- | -- | -- | --
   FP32 | 128 | 33.56 | 16.83
   Quantized | 128 | 23.94697 | 17.59
   
   The overhead, between TVM FP32 and TVM quantized, is because only Dense ops 
are quantized in the network, and there is a cost of back-and-forth quantize 
and dequantize. We will investigate if quantize, dequantize can be improved.
   
   @icemelon9 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-tvm] anijain2305 opened a new pull request #6115: [Topi, x86] Using MKL blas for quantized dense

Reply via email to