Re: Making new operators and AMP lists

Anirudh Subramanian Thu, 30 May 2019 19:47:01 -0700

> HOWEVER, as I was writing this reply I realized that due to pure luck
this is not actually what happens - optimizers could in fact be in the
FP32_FUNCS list. That is because, as AMP's assumption is that the model
being changed is FP32 model at the start, all weights (and so all the
gradients just before application as well) are stored in FP32 already (and
so the casting would not actually occur). It would therefore be possible to
make the FP32 list to be implicit (although relying on this lucky
coincidence makes me slightly uncomfortable).


This is under the assumption that the customers use the default lists. If
customer use their own lists, it can happen that some weights have to be in
FP16 and they may have to do some heavy lifting.  My understanding is that
this is the use case that makes you uncomfortable ? If not can you expand ?
The common use cases will be customer using default lists, and the later
use case may be rare.

Anirudh


On Thu, May 30, 2019 at 3:25 PM Sheng Zha <zhash...@apache.org> wrote:

> Given that we're swtiching from the practice of failing the AMP related
> test to warning, I intend to merge #15085 soon if no objection.
>
> -sz
>
> On 2019/05/30 19:07:07, Przemys��aw Tr��dak <ptre...@apache.org> wrote:
> > Hi Sam and Zhennan,
> >
> > the problem is not how to implicitly produce a list of all operators not
> in any other list - that is easy and the code Zhennan provided would work.
> The problem is that such list would not be actually correct in all cases -
> you do NOT want optimizers to land in FP32_FUNCS list, because then the
> tensor being updated could potentially be the FP32 copy and not the actual
> tensor you care about. That is why the first bullet point in the error
> message in the test is along the lines of: "if you do an optimizer or
> anything else that does not go in the computational graph, put it in the
> FP16_FP32_FUNCS list" - that list does not do any wrapping.
> >
> > HOWEVER, as I was writing this reply I realized that due to pure luck
> this is not actually what happens - optimizers could in fact be in the
> FP32_FUNCS list. That is because, as AMP's assumption is that the model
> being changed is FP32 model at the start, all weights (and so all the
> gradients just before application as well) are stored in FP32 already (and
> so the casting would not actually occur). It would therefore be possible to
> make the FP32 list to be implicit (although relying on this lucky
> coincidence makes me slightly uncomfortable).
> >
> > Przemek
> >
> >
> >
> > On 2019/05/30 08:04:05, "Qin, Zhennan" <zhennan....@intel.com> wrote:
> > > How about change the below line in amp.py:164
> > >
> > > wrap_list = fp32_ops if fp32_ops is not None else
> > > lists.symbol.FP32_FUNCS
> > >
> > > to be
> > >
> > >     plist = ctypes.POINTER(ctypes.c_char_p)()
> > >     size = ctypes.c_uint()
> > >
> > >     check_call(_LIB.MXListAllOpNames(ctypes.byref(size),
> > >                                      ctypes.byref(plist)))
> > >     op_names = []
> > >     for i in range(size.value):
> > >         op_names.append(py_str(plist[i]))
> > >
> > >     wrap_list = []
> > >     fp16_op_list = lists.symbol.FP16_FUNCS +
> > > lists.symbol.WIDEST_TYPE_CASTS + lists.symbol.FP16_FP32_FUNCS +
> > > list(map(lambda x: x[0], lists.symbol.CONDITIONAL_FP32_FUNCS))
> > >     for op_name in op_names:
> > >         if not op_name.startswith("_backward_") and not
> > > op_name.startswith("_contrib_backward_") and op_name not in
> > > fp16_op_list:
> > >             wrap_list.append(op_name)
> > >     wrap_list = fp32_ops if fp32_ops is not None else wrap_list
> > >
> > >
> > > I've checked, the changed code can produce identity wrap_list as
> > > lists.symbol.FP16_FP32_FUNCS.
> > >
> > > The check code is,
> > >
> > >     print("op that in wrap_list but not in FP32_FUNCS:")
> > >     for i in wrap_list:
> > >         if i not in lists.symbol.FP32_FUNCS:
> > >             print(i)
> > >
> > >     print("op that in FP32_FUNCS but not in wrap_list:")
> > >     for i in lists.symbol.FP32_FUNCS:
> > >         if i not in wrap_list:
> > >             print(i)
> > >
> > > The output is,
> > >
> > > op that in infered_fp32_op_list but not in FP32_FUNCS:
> > > op that in FP32_FUNCS but not in infered_fp32_op_list:
> > > op that in infered_fp32_op_list but not in FP32_FUNCS:
> > > op that in FP32_FUNCS but not in infered_fp32_op_list:
> > >
> > > This change should be able to automatically set new operator as fp32
> > > only.
> > >
> > > Thanks,
> > > Zhennan
> > >
> > > On Wed, 2019-05-29 at 19:13 +0000, Skalicky, Sam wrote:
> > > > Thanks Przemek for the additional explanation, but I’m still confused
> > > > on this part. I don’t understand the explanation of the optimizer’s
> > > > interaction here.
> > > >
> > > > > > The technical reason for that is the only place from which one
> > > > > > can get MXNet operators is MXListAllOps call, which gives back
> > > > > > all operators without any way of differentiating them, including
> > > > > > for example calls to optimizers. If somebody added a new
> > > > > > optimizer and AMP inserts a FP32 cast before its input, the
> > > > > > optimizer would actually update the FP32 copy of the tensor
> > > > > > instead of the tensor itself. This is not a performance problem
> > > > > > (which, I agree, would be perfectly fine to solve later) - it is
> > > > > > a correctness problem.
> > > > > > That is why the error message in the test explicitly mentions
> > > > > > this case as one where the operator should be put in the list
> > > > > > that does not make any casts.
> > > >
> > > >
> > > > If this is third list that Anirudh mentions:
> > > >
> > > > > > 3. Not sure, Don't know or dont care: FP32 list.
> > > >
> > > >
> > > > and this list simply contains all operators that have not been
> > > > categorized, why do we need the list at all?
> > > >
> > > > Thanks,
> > > > Sam
> > > >
> > > >
> > > > > On May 29, 2019, at 11:58 AM, Sheng Zha <zhash...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > The second misunderstanding is that the ask from the test is to
> > > > > > somehow "add support" for AMP in the operator. That is definitely
> > > > > > not true and adding a single line in either FP32_FUNCS (cast me
> > > > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me
> > > > > > because I'm not relevant for this usecase) is perfectly fine.
> > > > >
> > > > > I respectfully disagree with this statement. In this case, the
> > > > > "support to add" is exactly to differentiate how the new operator
> > > > > should be treated. It was not there before, so it's new. And
> > > > > requiring other developers to add any line is totally not fine.
> > > > >
> > > > > > The test ensures that the list is updated at the same time as new
> > > > > > operator is introduced.
> > > > >
> > > > > It would indeed have that effect. The problem is that the test
> > > > > requires the wrong person to add such support. A developer who adds
> > > > > a new operator should be under no obligation to do anything with
> > > > > AMP.
> > > > >
> > > > > Having it as warning both in runtime and in the test has the effect
> > > > > to prompt people who actually use and care about the performance
> > > > > from AMP to take the steps and decide where a new operator fits.
> > > > > And like you said, "It is up to people like me actively working on
> > > > > better FP16 support to go through the FP32 list to introduce
> > > > > support and then move such operator to another list." This is also
> > > > > exactly how it should work with AMP.
> > > > >
> > > > > -sz
> > > > >
> > > > > On 2019/05/29 18:25:46, Przemys��aw Tr��dak <ptre...@apache.org>
> > > > > wrote:
> > > > > > Thank you all for responding.
> > > > > >
> > > > > > Let me address a few misunderstandings about AMP in the
> > > > > > discussion so far:
> > > > > > - I think the main misunderstanding in the discussion so far is
> > > > > > that it would be somehow possible to implicitly cast to FP32 all
> > > > > > previously unseen operators. I would much prefer such solution
> > > > > > myself, however this is actually (unfortunately) not really
> > > > > > possible. The technical reason for that is the only place from
> > > > > > which one can get MXNet operators is MXListAllOps call, which
> > > > > > gives back all operators without any way of differentiating them,
> > > > > > including for example calls to optimizers. If somebody added a
> > > > > > new optimizer and AMP inserts a FP32 cast before its input, the
> > > > > > optimizer would actually update the FP32 copy of the tensor
> > > > > > instead of the tensor itself. This is not a performance problem
> > > > > > (which, I agree, would be perfectly fine to solve later) - it is
> > > > > > a correctness problem. That is why the error message in the test
> > > > > > explicitly mentions this case as one where the operator should be
> > > > > > put in the list that does not make any casts.
> > > > > > - The second misunderstanding is that the ask from the test is to
> > > > > > somehow "add support" for AMP in the operator. That is definitely
> > > > > > not true and adding a single line in either FP32_FUNCS (cast me
> > > > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me
> > > > > > because I'm not relevant for this usecase) is perfectly fine. It
> > > > > > is up to people like me actively working on better FP16 support
> > > > > > to go through the FP32 list to introduce support and then move
> > > > > > such operator to another list.
> > > > > > - The third (although minor) misunderstanding is that this
> > > > > > feature is completely GPU-specific. It is not, and actually the
> > > > > > review comment that prompted me to introduce the design as is
> > > > > > currently implemented comes from MKLDNN developer :-).
> > > > > >
> > > > > > About the proposal to move it from Python list into a operator
> > > > > > property - I am in favor of it and, as Anirudh mentioned, this
> > > > > > should make it much easier when just adding aliases to the
> > > > > > operators. But again, this could not be an optional parameter
> > > > > > unfortunately since automatic "put me in FP32 list" is not
> > > > > > possible.
> > > > > > About the proposal to change the error into a warning and then
> > > > > > rely on people interested in AMP to adding operators to AMP lists
> > > > > > - I do not like this proposal because of user experience that it
> > > > > > would introduce. Basically what would happen is a user who
> > > > > > downloads nightly build of MXNet (because they needed a feature
> > > > > > introduced after last official release) would likely find that
> > > > > > AMP is broken in that build because it happened between updates
> > > > > > to the AMP lists. The test ensures that the list is updated at
> > > > > > the same time as new operator is introduced. This is a question
> > > > > > really of how do we treat nightly releases - should the user
> > > > > > expect for all the features to generally work in them or not? If
> > > > > > the Community thinks that it is acceptable outcome then I am fine
> > > > > > with the proposal and regularly updating those lists myself.
> > > > > >
> > > > > > Thank you
> > > > > > Przemek
> > > > > >
> > > > > > On 2019/05/28 21:32:42, Przemys��aw Tr��dak <ptre...@apache.org>
> > > > > > wrote:
> > > > > > > Dear Community,
> > > > > > >
> > > > > > > One of the recently merged features of the 1.5 release, AMP
> > > > > > > (Automatic Mixed Precision) support (PR [1], design doc [5]),
> > > > > > > introduced a requirement that every new operator added to MXNet
> > > > > > > would need to be present in 1 of the lists (in [2]). To make
> > > > > > > sure that this requirement is not broken when somebody adds a
> > > > > > > new operator and does not know about AMP's existence, a test
> > > > > > > was added to CI ([3]).
> > > > > > >
> > > > > > > A few people reached out to me (the original author of the
> > > > > > > feature) saying this test increases a burden on a developer of
> > > > > > > new operators and should not be an actual error, but just
> > > > > > > warning (PR for that change [4]). That is why I would like to
> > > > > > > present a motivation for it and discuss with the wider audience
> > > > > > > why I feel it was necessary.
> > > > > > >
> > > > > > > First, for people who do not know the details of what AMP is -
> > > > > > > it is a solution that tries to automatically apply best
> > > > > > > practices of training in lower precision (FP16) to user's FP32
> > > > > > > model in order to fully utilize capabilities of modern GPUs
> > > > > > > (and potentially other hardware in the future). It does so by
> > > > > > > casting to lower precision inputs to operators benefitting from
> > > > > > > it, while casting to full precision inputs of operators that
> > > > > > > are unsafe to run in lower precision or just do not support it.
> > > > > > >
> > > > > > > The first iteration of AMP kept 2 main lists of operators -
> > > > > > > operators that are beneficial and safe to do in fp16 and
> > > > > > > operators that need to be cast to FP32. The problem (raised in
> > > > > > > review of the PR [6], [8]) is how to make sure that the feature
> > > > > > > works as intended and is not inadvertently broken by somebody
> > > > > > > adding a new operator. The failure scenario here is adding a
> > > > > > > new operator that does not support FP16 and so should be cast
> > > > > > > to FP32, but AMP does not know about its existence and so does
> > > > > > > not do the casting. The solution proposed in the review was to
> > > > > > > implicitly treat all of the unknown operators as FP32-only and
> > > > > > > keep the list of operators that work fine in both FP16 and
> > > > > > > FP32. This solution however does not really work, because there
> > > > > > > are multiple operators (most notably optimizers) where
> > > > > > > introducing additional casting of the input to FP32 would break
> > > > > > > the operator.
> > > > > > >
> > > > > > > That is why after discussion with a few members of the
> > > > > > > community, I decided to proceed with all lists being explicit
> > > > > > > and introducing the test that would fail when somebody added an
> > > > > > > operator without classifying it into 1 of the categories, and
> > > > > > > explain clearly how to do it [7]. It is not ideal solution, as
> > > > > > > it introduces some burden on the developers who are not aware
> > > > > > > about AMP, however in the typical case of adding at most a few
> > > > > > > operators to MXNet the inconvenience is I think pretty minor
> > > > > > > while important for the feature correctness going forward.
> > > > > > >
> > > > > > > I would like to gather Community feedback and ideas how to
> > > > > > > handle this situation.
> > > > > > >
> > > > > > > [1] https://github.com/apache/incubator-mxnet/pull/14173
> > > > > > > [2]
> > > > > > >
> https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/amp/lists/symbol.py
> > > > > > > [3]
> > > > > > >
> https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py
> > > > > > > [4] https://github.com/apache/incubator-mxnet/pull/15085
> > > > > > > [5]
> > > > > > >
> https://docs.google.com/document/d/1sQzMoPEwux0WXSWirY07us1POD_6Y8pLYq--b9Fvd1o/edit?usp=sharing
> > > > > > > [6]
> > > > > > >
> https://github.com/apache/incubator-mxnet/pull/14173#discussion_r270728019
> > > > > > > [7]
> > > > > > >
> https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py#L62-L80
> > > > > > > [8]
> > > > > > >
> https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341
> > > > > > >
> > > >
> > > >
> > >
> >
>

Re: Making new operators and AMP lists

Reply via email to