I had been working on the sparse tensor project with Haibin. After it was wrapped up for the first stage, I started my work on the quantization project (INT-8 inference). The benefits of using quantized models for inference include much higher inference throughput than FP32 model with acceptable accuracy loss and compact models saved on small devices. The work currently aims at quantizing ConvNets, and we will consider expanding it to RNN networks after getting good results for images. Meanwhile, it's expected to support quantization on CPU, GPU, and mobile devices.
- Re: What's everyone working on? Zihao Zheng
- Re: What's everyone working on? YiZhi Liu
- Re: What's everyone working on? Dominic Divakaruni
- Re: What's everyone working on? YiZhi Liu
- Re: What's everyone working o... Nan Zhu
- Re: What's everyone working o... Naveen Swamy
- Re: What's everyone working o... Nan Zhu
- Re: What's everyone working o... Dom Divakaruni
- Re: What's everyone working o... kellen sunderland
- Re: What's everyone working o... Rahul Huilgol
- Re: What's everyone working on? Jun Wu
- Re: What's everyone working on? Seb Kiureghian
- Re: What's everyone working on? Dominic Divakaruni
- Re: What's everyone working on? Chris Olivier
- Re: What's everyone working on? Jun Wu
- Re: What's everyone working on? Bhavin Thaker
- Re: What's everyone working on? Dominic Divakaruni
- Re: What's everyone working o... Dominic Divakaruni
