anijain2305 opened a new pull request #6115:
URL: https://github.com/apache/incubator-tvm/pull/6115


   Using MKL for quantized dense, following the MKL fallback for FP32 dense.
   
   On C5.12x large cascade lake with VNNI support, results for BERT base are as 
follows (latency in ms)
   
   Type | Batch size | MXNet+MKL | TVM+MKL
   -- | -- | -- | --
   FP32 | 128 | 33.56 | 16.83
   Quantized | 128 | 23.94697 | 17.59
   
   The overhead, between TVM FP32 and TVM quantized, is because only Dense ops 
are quantized in the network, and there is a cost of back-and-forth quantize 
and dequantize. We will investigate if quantize, dequantize can be improved.
   
   @icemelon9 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to