Given that we're swtiching from the practice of failing the AMP related test to 
warning, I intend to merge #15085 soon if no objection.

-sz

On 2019/05/30 19:07:07, Przemys��aw Tr��dak <[email protected]> wrote: 
> Hi Sam and Zhennan, 
> 
> the problem is not how to implicitly produce a list of all operators not in 
> any other list - that is easy and the code Zhennan provided would work. The 
> problem is that such list would not be actually correct in all cases - you do 
> NOT want optimizers to land in FP32_FUNCS list, because then the tensor being 
> updated could potentially be the FP32 copy and not the actual tensor you care 
> about. That is why the first bullet point in the error message in the test is 
> along the lines of: "if you do an optimizer or anything else that does not go 
> in the computational graph, put it in the FP16_FP32_FUNCS list" - that list 
> does not do any wrapping.
> 
> HOWEVER, as I was writing this reply I realized that due to pure luck this is 
> not actually what happens - optimizers could in fact be in the FP32_FUNCS 
> list. That is because, as AMP's assumption is that the model being changed is 
> FP32 model at the start, all weights (and so all the gradients just before 
> application as well) are stored in FP32 already (and so the casting would not 
> actually occur). It would therefore be possible to make the FP32 list to be 
> implicit (although relying on this lucky coincidence makes me slightly 
> uncomfortable).
> 
> Przemek 
> 
> 
> 
> On 2019/05/30 08:04:05, "Qin, Zhennan" <[email protected]> wrote: 
> > How about change the below line in amp.py:164
> > 
> > wrap_list = fp32_ops if fp32_ops is not None else
> > lists.symbol.FP32_FUNCS
> > 
> > to be
> > 
> >     plist = ctypes.POINTER(ctypes.c_char_p)()
> >     size = ctypes.c_uint()
> > 
> >     check_call(_LIB.MXListAllOpNames(ctypes.byref(size),
> >                                      ctypes.byref(plist)))
> >     op_names = []
> >     for i in range(size.value):
> >         op_names.append(py_str(plist[i]))
> > 
> >     wrap_list = []
> >     fp16_op_list = lists.symbol.FP16_FUNCS +
> > lists.symbol.WIDEST_TYPE_CASTS + lists.symbol.FP16_FP32_FUNCS +
> > list(map(lambda x: x[0], lists.symbol.CONDITIONAL_FP32_FUNCS))
> >     for op_name in op_names:
> >         if not op_name.startswith("_backward_") and not
> > op_name.startswith("_contrib_backward_") and op_name not in
> > fp16_op_list:
> >             wrap_list.append(op_name)
> >     wrap_list = fp32_ops if fp32_ops is not None else wrap_list
> > 
> > 
> > I've checked, the changed code can produce identity wrap_list as
> > lists.symbol.FP16_FP32_FUNCS.
> > 
> > The check code is,
> > 
> >     print("op that in wrap_list but not in FP32_FUNCS:")
> >     for i in wrap_list:
> >         if i not in lists.symbol.FP32_FUNCS:
> >             print(i)
> > 
> >     print("op that in FP32_FUNCS but not in wrap_list:")
> >     for i in lists.symbol.FP32_FUNCS:
> >         if i not in wrap_list:
> >             print(i)
> > 
> > The output is,
> > 
> > op that in infered_fp32_op_list but not in FP32_FUNCS:
> > op that in FP32_FUNCS but not in infered_fp32_op_list:
> > op that in infered_fp32_op_list but not in FP32_FUNCS:
> > op that in FP32_FUNCS but not in infered_fp32_op_list:
> > 
> > This change should be able to automatically set new operator as fp32
> > only.
> > 
> > Thanks,
> > Zhennan
> > 
> > On Wed, 2019-05-29 at 19:13 +0000, Skalicky, Sam wrote:
> > > Thanks Przemek for the additional explanation, but I’m still confused
> > > on this part. I don’t understand the explanation of the optimizer’s
> > > interaction here. 
> > > 
> > > > > The technical reason for that is the only place from which one
> > > > > can get MXNet operators is MXListAllOps call, which gives back
> > > > > all operators without any way of differentiating them, including
> > > > > for example calls to optimizers. If somebody added a new
> > > > > optimizer and AMP inserts a FP32 cast before its input, the
> > > > > optimizer would actually update the FP32 copy of the tensor
> > > > > instead of the tensor itself. This is not a performance problem
> > > > > (which, I agree, would be perfectly fine to solve later) - it is
> > > > > a correctness problem.
> > > > > That is why the error message in the test explicitly mentions
> > > > > this case as one where the operator should be put in the list
> > > > > that does not make any casts.
> > > 
> > > 
> > > If this is third list that Anirudh mentions:
> > > 
> > > > > 3. Not sure, Don't know or dont care: FP32 list.
> > > 
> > > 
> > > and this list simply contains all operators that have not been
> > > categorized, why do we need the list at all? 
> > > 
> > > Thanks,
> > > Sam
> > > 
> > > 
> > > > On May 29, 2019, at 11:58 AM, Sheng Zha <[email protected]>
> > > > wrote:
> > > > 
> > > > > The second misunderstanding is that the ask from the test is to
> > > > > somehow "add support" for AMP in the operator. That is definitely
> > > > > not true and adding a single line in either FP32_FUNCS (cast me
> > > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me
> > > > > because I'm not relevant for this usecase) is perfectly fine.
> > > > 
> > > > I respectfully disagree with this statement. In this case, the
> > > > "support to add" is exactly to differentiate how the new operator
> > > > should be treated. It was not there before, so it's new. And
> > > > requiring other developers to add any line is totally not fine.
> > > > 
> > > > > The test ensures that the list is updated at the same time as new
> > > > > operator is introduced.
> > > > 
> > > > It would indeed have that effect. The problem is that the test
> > > > requires the wrong person to add such support. A developer who adds
> > > > a new operator should be under no obligation to do anything with
> > > > AMP.
> > > > 
> > > > Having it as warning both in runtime and in the test has the effect
> > > > to prompt people who actually use and care about the performance
> > > > from AMP to take the steps and decide where a new operator fits.
> > > > And like you said, "It is up to people like me actively working on
> > > > better FP16 support to go through the FP32 list to introduce
> > > > support and then move such operator to another list." This is also
> > > > exactly how it should work with AMP.
> > > > 
> > > > -sz
> > > > 
> > > > On 2019/05/29 18:25:46, Przemys��aw Tr��dak <[email protected]>
> > > > wrote: 
> > > > > Thank you all for responding.
> > > > > 
> > > > > Let me address a few misunderstandings about AMP in the
> > > > > discussion so far:
> > > > > - I think the main misunderstanding in the discussion so far is
> > > > > that it would be somehow possible to implicitly cast to FP32 all
> > > > > previously unseen operators. I would much prefer such solution
> > > > > myself, however this is actually (unfortunately) not really
> > > > > possible. The technical reason for that is the only place from
> > > > > which one can get MXNet operators is MXListAllOps call, which
> > > > > gives back all operators without any way of differentiating them,
> > > > > including for example calls to optimizers. If somebody added a
> > > > > new optimizer and AMP inserts a FP32 cast before its input, the
> > > > > optimizer would actually update the FP32 copy of the tensor
> > > > > instead of the tensor itself. This is not a performance problem
> > > > > (which, I agree, would be perfectly fine to solve later) - it is
> > > > > a correctness problem. That is why the error message in the test
> > > > > explicitly mentions this case as one where the operator should be
> > > > > put in the list that does not make any casts.
> > > > > - The second misunderstanding is that the ask from the test is to
> > > > > somehow "add support" for AMP in the operator. That is definitely
> > > > > not true and adding a single line in either FP32_FUNCS (cast me
> > > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me
> > > > > because I'm not relevant for this usecase) is perfectly fine. It
> > > > > is up to people like me actively working on better FP16 support
> > > > > to go through the FP32 list to introduce support and then move
> > > > > such operator to another list.
> > > > > - The third (although minor) misunderstanding is that this
> > > > > feature is completely GPU-specific. It is not, and actually the
> > > > > review comment that prompted me to introduce the design as is
> > > > > currently implemented comes from MKLDNN developer :-).
> > > > > 
> > > > > About the proposal to move it from Python list into a operator
> > > > > property - I am in favor of it and, as Anirudh mentioned, this
> > > > > should make it much easier when just adding aliases to the
> > > > > operators. But again, this could not be an optional parameter
> > > > > unfortunately since automatic "put me in FP32 list" is not
> > > > > possible.
> > > > > About the proposal to change the error into a warning and then
> > > > > rely on people interested in AMP to adding operators to AMP lists
> > > > > - I do not like this proposal because of user experience that it
> > > > > would introduce. Basically what would happen is a user who
> > > > > downloads nightly build of MXNet (because they needed a feature
> > > > > introduced after last official release) would likely find that
> > > > > AMP is broken in that build because it happened between updates
> > > > > to the AMP lists. The test ensures that the list is updated at
> > > > > the same time as new operator is introduced. This is a question
> > > > > really of how do we treat nightly releases - should the user
> > > > > expect for all the features to generally work in them or not? If
> > > > > the Community thinks that it is acceptable outcome then I am fine
> > > > > with the proposal and regularly updating those lists myself.
> > > > > 
> > > > > Thank you
> > > > > Przemek
> > > > > 
> > > > > On 2019/05/28 21:32:42, Przemys��aw Tr��dak <[email protected]>
> > > > > wrote: 
> > > > > > Dear Community,
> > > > > > 
> > > > > > One of the recently merged features of the 1.5 release, AMP
> > > > > > (Automatic Mixed Precision) support (PR [1], design doc [5]),
> > > > > > introduced a requirement that every new operator added to MXNet
> > > > > > would need to be present in 1 of the lists (in [2]). To make
> > > > > > sure that this requirement is not broken when somebody adds a
> > > > > > new operator and does not know about AMP's existence, a test
> > > > > > was added to CI ([3]).
> > > > > > 
> > > > > > A few people reached out to me (the original author of the
> > > > > > feature) saying this test increases a burden on a developer of
> > > > > > new operators and should not be an actual error, but just
> > > > > > warning (PR for that change [4]). That is why I would like to
> > > > > > present a motivation for it and discuss with the wider audience
> > > > > > why I feel it was necessary.
> > > > > > 
> > > > > > First, for people who do not know the details of what AMP is -
> > > > > > it is a solution that tries to automatically apply best
> > > > > > practices of training in lower precision (FP16) to user's FP32
> > > > > > model in order to fully utilize capabilities of modern GPUs
> > > > > > (and potentially other hardware in the future). It does so by
> > > > > > casting to lower precision inputs to operators benefitting from
> > > > > > it, while casting to full precision inputs of operators that
> > > > > > are unsafe to run in lower precision or just do not support it.
> > > > > > 
> > > > > > The first iteration of AMP kept 2 main lists of operators -
> > > > > > operators that are beneficial and safe to do in fp16 and
> > > > > > operators that need to be cast to FP32. The problem (raised in
> > > > > > review of the PR [6], [8]) is how to make sure that the feature
> > > > > > works as intended and is not inadvertently broken by somebody
> > > > > > adding a new operator. The failure scenario here is adding a
> > > > > > new operator that does not support FP16 and so should be cast
> > > > > > to FP32, but AMP does not know about its existence and so does
> > > > > > not do the casting. The solution proposed in the review was to
> > > > > > implicitly treat all of the unknown operators as FP32-only and
> > > > > > keep the list of operators that work fine in both FP16 and
> > > > > > FP32. This solution however does not really work, because there
> > > > > > are multiple operators (most notably optimizers) where
> > > > > > introducing additional casting of the input to FP32 would break
> > > > > > the operator.
> > > > > > 
> > > > > > That is why after discussion with a few members of the
> > > > > > community, I decided to proceed with all lists being explicit
> > > > > > and introducing the test that would fail when somebody added an
> > > > > > operator without classifying it into 1 of the categories, and
> > > > > > explain clearly how to do it [7]. It is not ideal solution, as
> > > > > > it introduces some burden on the developers who are not aware
> > > > > > about AMP, however in the typical case of adding at most a few
> > > > > > operators to MXNet the inconvenience is I think pretty minor
> > > > > > while important for the feature correctness going forward.
> > > > > > 
> > > > > > I would like to gather Community feedback and ideas how to
> > > > > > handle this situation.
> > > > > > 
> > > > > > [1] https://github.com/apache/incubator-mxnet/pull/14173
> > > > > > [2] 
> > > > > > https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/amp/lists/symbol.py
> > > > > > [3] 
> > > > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py
> > > > > > [4] https://github.com/apache/incubator-mxnet/pull/15085
> > > > > > [5] 
> > > > > > https://docs.google.com/document/d/1sQzMoPEwux0WXSWirY07us1POD_6Y8pLYq--b9Fvd1o/edit?usp=sharing
> > > > > > [6] 
> > > > > > https://github.com/apache/incubator-mxnet/pull/14173#discussion_r270728019
> > > > > > [7] 
> > > > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py#L62-L80
> > > > > > [8] 
> > > > > > https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341
> > > > > > 
> > > 
> > > 
> > 
> 

Reply via email to