Hi Tao, I covered in the doc that it is specifically about inference. I can add another section in FAQ to mention why INT8 quantization is not included.
Anirudh On Tue, Apr 30, 2019 at 7:59 AM Lv, Tao A <tao.a...@intel.com> wrote: > Thank you Anirudh! I'm just a little surprised that when we talk about > mixed precision model we don't talk about training, and when talk about > inference, INT8 quantization is not mentioned~ > > -----Original Message----- > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] > Sent: Tuesday, April 30, 2019 8:27 PM > To: dev@mxnet.incubator.apache.org > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models > > Hi Zach, > > I checked the QuantizeGraph pass and I think probably it can benefit from > CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said > that, I think it may still be an overkill to add another NNVM pass to have > a generic common subexpression elimination pass. Currently, this > elimination logic takes only additional 3 to 6 lines of code in each of the > two NNVM pass. Also, a generic common subexpression elimination has its own > associated maintenance costs. I think it is better to continue with the > current approach and revisit this need in the future as we add more NNVM > passes. > > Anirudh > > On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <anirudh2...@gmail.com > > > wrote: > > > Hi Zach, > > > > You raise an interesting point. Thank you for the pointer! > > > > Incorporating CSE pass comes with its own cost, and the advantage it > > brings is to make the ReducePrecision nnvm pass more lightweight. > > Since the amortized cost of the ReducePrecision pass is O(1) it > > shouldn't matter much whether we add it or not from performance point > of view. > > > > From maintenance point of view, I would agree that separating these > > two logics can be helpful if we have other such workflows which > > require the original Pass followed by CSE pass. Currently, as far as I > > know only the ReducePrecision pass using it. I will check to see if > > CSE pass can benefit other NNVM pass also like quantization pass apart > > from ReducePrecision, and will get back. > > > > Anirudh > > > > On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg > > <zachary.kimb...@gmail.com> > > wrote: > > > >> I have one suggestion. In the current design, there are the > >> additional maps from each input entry to each target casted entry > >> dtype in order to avoid creating duplicate casts. Instead of creating > >> these, another option is to use a general purpose Common > >> Subexpression Elimination (CSE) [1] pass to apply afterwards. So, you > >> would run the mixed precision pass which creates the duplicates and > >> then the CSE pass which would remove all duplicates. > >> > >> This design is common in existing compilers like LLVM because > >> maintaining and testing the passes is much easier when they are kept > >> as simple as possible. The CSE can also be reused as necessary for > >> other passes that could create duplicates or to remove duplicate > expressions in general. > >> This > >> tutorial [2] talks about it a bit. > >> > >> Zach > >> > >> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination > >> [2] - https://blog.regehr.org/archives/1603 > >> > >> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian < > >> anirudh2...@gmail.com> > >> wrote: > >> > >> > Hi Tao, > >> > > >> > Thanks for raising this question! I thought about the existing > >> quantization > >> > workflow and whether it can be included with the AMP API. Although > >> > quantization can be considered as mixed precision, there are > >> differences. > >> > For example, only a small number of operators can be quantized > >> > compared > >> to > >> > the operators that can run in FP16 precision. Thus, overriding the > >> > operators to run in original dtype vs target dtype doesnt make much > >> sense > >> > for quantization. > >> > > >> > Also, quantization workflow may require a calibration dataset to > >> calibrate > >> > the min and max and calib_mode. > >> > Arriving at a common API, for quantization with calibration and > >> > mixed precision inference (FP16 and BF16) may make the API too > >> > complicated and not very easy to use. I understand that this may > >> > cause some confusion as people may try to use target_dtype of int8 > >> > but I think its still better than causing user confusion with the API > usage. > >> > > >> > Also, when we move quantize_model APIs outside contrib we can > >> > consider adding them under AMP namespace. The challenge would then > >> > be to educate users on difference between "quantize" and "convert". > >> > > >> > Anirudh > >> > > >> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a...@intel.com> wrote: > >> > > >> > > Thank you for the explanation. Sorry I didn't realize the > >> > > proposal is > >> for > >> > > inference only. > >> > > > >> > > Then how do you think the amp_cast and amp_multicast in this > >> > > proposal > >> can > >> > > work with the existing INT8 quantization workflow which I think > >> > > should > >> > also > >> > > be considered as 'mixed precision'. > >> > > > >> > > -----Original Message----- > >> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] > >> > > Sent: Monday, April 29, 2019 10:25 PM > >> > > To: dev@mxnet.incubator.apache.org > >> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision > >> Models > >> > > > >> > > Hi Tao, > >> > > > >> > > The APIs proposed: "convert_model" and "convert_block" are mainly > >> > > for inference use cases, where customers bring a FP32 model to > >> > > convert it > >> to > >> > a > >> > > mixed precision model to get improved performance while not > >> > > losing > >> out on > >> > > the accuracy. > >> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is > >> supposed > >> > > to handle the training use cases and this proposal doesn't cover > >> > > the > >> AMP > >> > > feature added in the PR. I think ptrendx@ and canoerst@ are > >> > > better equipped to answer questions 1 and 2. > >> > > > >> > > > - more generally, what will be saved when users want to > >> > > > serialize their > >> > > model to disk? > >> > > > >> > > Lets say users want to save converted mixed precision model used > >> > > for inference to disk. It will save both, the symbol with the > >> > > amp_cast and amp_multicast operators and the params (which are > >> > > casted if > >> necessary). > >> > > > >> > > Anirudh > >> > > > >> > > > >> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a...@intel.com> > wrote: > >> > > > >> > > > Thank you for sharing this, Anirudh. > >> > > > > >> > > > Curious to know: > >> > > > - what will be saved in a training checkpoint or snapshot? Can > >> > > > it be resumed on another platform which might not support the > >> > > > lower precision the previous one used? > >> > > > - what will be saved in the final symbol.json and params file > >> > > > when training is finished? > >> > > > - more generally, what will be saved when users want to > >> > > > serialize their model to disk? > >> > > > > >> > > > Thank you, > >> > > > -tao > >> > > > > >> > > > -----Original Message----- > >> > > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] > >> > > > Sent: Monday, April 29, 2019 7:00 PM > >> > > > To: dev@mxnet.incubator.apache.org > >> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision > >> > > > Models > >> > > > > >> > > > Hi all, > >> > > > > >> > > > I have created a doc for conversion from FP32 to Mixed > >> > > > Precision > >> > Models: > >> > > > > >> > > > > >> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP3 > >> 2 > >> > > > +to+Mixed+Precision+Models > >> > > > > >> > > > I look forward to your feedback on the same. > >> > > > > >> > > > Thanks, > >> > > > Anirudh > >> > > > > >> > > > >> > > >> > > >