Re: Proposal for Conversion from FP32 to Mixed Precision Models

Anirudh Subramanian Tue, 30 Apr 2019 08:14:43 -0700

Hi Tao,

I covered in the doc that it is specifically about inference. I can add
another section in FAQ to mention why INT8 quantization is not included.


Anirudh

On Tue, Apr 30, 2019 at 7:59 AM Lv, Tao A <tao.a...@intel.com> wrote:

> Thank you Anirudh! I'm just a little surprised that when we talk about
> mixed precision model we don't talk about training, and when talk about
> inference, INT8 quantization is not mentioned~
>
> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, April 30, 2019 8:27 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi Zach,
>
> I checked the QuantizeGraph pass and I think probably it can benefit from
> CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said
> that, I think it may still be an overkill to add another NNVM pass to have
> a generic common subexpression elimination pass. Currently, this
> elimination logic takes only additional 3 to 6 lines of code in each of the
> two NNVM pass. Also, a generic common subexpression elimination has its own
> associated maintenance costs. I think it is better to continue with the
> current approach and revisit this need in the future as we add more NNVM
> passes.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <anirudh2...@gmail.com
> >
> wrote:
>
> > Hi Zach,
> >
> > You raise an interesting point. Thank you for the pointer!
> >
> > Incorporating CSE pass comes with its own cost, and the advantage it
> > brings is to make the ReducePrecision nnvm pass more lightweight.
> > Since the amortized cost of the ReducePrecision pass is O(1) it
> > shouldn't matter much whether we  add it or not from performance point
> of view.
> >
> > From maintenance point of view, I would agree that separating these
> > two logics can be helpful if we have other such workflows which
> > require the original Pass followed by CSE pass. Currently, as far as I
> > know only the ReducePrecision pass using it. I will check to see if
> > CSE pass can benefit other NNVM pass also like quantization pass apart
> > from ReducePrecision, and will get back.
> >
> > Anirudh
> >
> > On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg
> > <zachary.kimb...@gmail.com>
> > wrote:
> >
> >> I have one suggestion. In the current design, there are the
> >> additional maps from each input entry to each target casted entry
> >> dtype in order to avoid creating duplicate casts. Instead of creating
> >> these, another option is to use a general purpose Common
> >> Subexpression Elimination (CSE) [1] pass to apply afterwards. So, you
> >> would run the mixed precision pass which creates the duplicates and
> >> then the CSE pass which would remove all duplicates.
> >>
> >> This design is common in existing compilers like LLVM because
> >> maintaining and testing the passes is much easier when they are kept
> >> as simple as possible. The CSE can also be reused as necessary for
> >> other passes that could create duplicates or to remove duplicate
> expressions in general.
> >> This
> >> tutorial [2] talks about it a bit.
> >>
> >> Zach
> >>
> >> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
> >> [2] - https://blog.regehr.org/archives/1603
> >>
> >> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <
> >> anirudh2...@gmail.com>
> >> wrote:
> >>
> >> > Hi Tao,
> >> >
> >> > Thanks for raising this question! I thought about the existing
> >> quantization
> >> > workflow and whether it can be included with the AMP API. Although
> >> > quantization can be considered as mixed precision, there are
> >> differences.
> >> > For example, only a small number of operators can be quantized
> >> > compared
> >> to
> >> > the operators that can run in FP16 precision. Thus, overriding the
> >> > operators to run in original dtype vs target dtype doesnt make much
> >> sense
> >> > for quantization.
> >> >
> >> > Also, quantization workflow may require a calibration dataset to
> >> calibrate
> >> > the min and max and calib_mode.
> >> > Arriving at a common API, for quantization with calibration and
> >> > mixed precision inference (FP16 and BF16) may make the API too
> >> > complicated and not very easy to use. I understand that this may
> >> > cause some confusion as people may try to use target_dtype of int8
> >> > but I think its still better than causing user confusion with the API
> usage.
> >> >
> >> > Also, when we move quantize_model APIs outside contrib we can
> >> > consider adding them under AMP namespace. The challenge would then
> >> > be to educate users on difference between "quantize" and "convert".
> >> >
> >> > Anirudh
> >> >
> >> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a...@intel.com> wrote:
> >> >
> >> > > Thank you for the explanation. Sorry I didn't realize the
> >> > > proposal is
> >> for
> >> > > inference only.
> >> > >
> >> > > Then how do you think the amp_cast and amp_multicast in this
> >> > > proposal
> >> can
> >> > > work with the existing INT8 quantization workflow which I think
> >> > > should
> >> > also
> >> > > be considered as 'mixed precision'.
> >> > >
> >> > > -----Original Message-----
> >> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> >> > > Sent: Monday, April 29, 2019 10:25 PM
> >> > > To: dev@mxnet.incubator.apache.org
> >> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
> >> Models
> >> > >
> >> > > Hi Tao,
> >> > >
> >> > > The APIs proposed: "convert_model" and "convert_block" are mainly
> >> > > for inference use cases, where customers bring a FP32 model to
> >> > > convert it
> >> to
> >> > a
> >> > > mixed precision model to get improved performance while not
> >> > > losing
> >> out on
> >> > > the accuracy.
> >> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
> >> supposed
> >> > > to handle the training use cases and this proposal doesn't cover
> >> > > the
> >> AMP
> >> > > feature added in the PR. I think ptrendx@ and canoerst@ are
> >> > > better equipped to answer questions 1 and 2.
> >> > >
> >> > > > - more generally, what will be saved when users want to
> >> > > > serialize their
> >> > > model to disk?
> >> > >
> >> > > Lets say users want to save converted mixed precision model used
> >> > > for inference to disk. It will save both, the symbol with the
> >> > > amp_cast and amp_multicast operators and the params (which are
> >> > > casted if
> >> necessary).
> >> > >
> >> > > Anirudh
> >> > >
> >> > >
> >> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a...@intel.com>
> wrote:
> >> > >
> >> > > > Thank you for sharing this, Anirudh.
> >> > > >
> >> > > > Curious to know:
> >> > > > - what will be saved in a training checkpoint or snapshot? Can
> >> > > > it be resumed on another platform which might not support the
> >> > > > lower precision the previous one used?
> >> > > > - what will be saved in the final symbol.json and params file
> >> > > > when training is finished?
> >> > > > - more generally, what will be saved when users want to
> >> > > > serialize their model to disk?
> >> > > >
> >> > > > Thank you,
> >> > > > -tao
> >> > > >
> >> > > > -----Original Message-----
> >> > > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> >> > > > Sent: Monday, April 29, 2019 7:00 PM
> >> > > > To: dev@mxnet.incubator.apache.org
> >> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision
> >> > > > Models
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > I have created a doc for conversion from FP32 to Mixed
> >> > > > Precision
> >> > Models:
> >> > > >
> >> > > >
> >> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP3
> >> 2
> >> > > > +to+Mixed+Precision+Models
> >> > > >
> >> > > > I look forward to your feedback on the same.
> >> > > >
> >> > > > Thanks,
> >> > > > Anirudh
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Reply via email to