It would be awesome if MXNet were the first DL framework to support Nvidia Volta. What do you all think about cutting a v0.12 release once that integration is ready?
On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <[email protected]> wrote: > I had been working on the sparse tensor project with Haibin. After it was > wrapped up for the first stage, I started my work on the quantization > project (INT-8 inference). The benefits of using quantized models for > inference include much higher inference throughput than FP32 model with > acceptable accuracy loss and compact models saved on small devices. The > work currently aims at quantizing ConvNets, and we will consider expanding > it to RNN networks after getting good results for images. Meanwhile, it's > expected to support quantization on CPU, GPU, and mobile devices. >
