How about change the below line in amp.py:164

wrap_list = fp32_ops if fp32_ops is not None else
lists.symbol.FP32_FUNCS

to be

    plist = ctypes.POINTER(ctypes.c_char_p)()
    size = ctypes.c_uint()

    check_call(_LIB.MXListAllOpNames(ctypes.byref(size),
                                     ctypes.byref(plist)))
    op_names = []
    for i in range(size.value):
        op_names.append(py_str(plist[i]))

    wrap_list = []
    fp16_op_list = lists.symbol.FP16_FUNCS +
lists.symbol.WIDEST_TYPE_CASTS + lists.symbol.FP16_FP32_FUNCS +
list(map(lambda x: x[0], lists.symbol.CONDITIONAL_FP32_FUNCS))
    for op_name in op_names:
        if not op_name.startswith("_backward_") and not
op_name.startswith("_contrib_backward_") and op_name not in
fp16_op_list:
            wrap_list.append(op_name)
    wrap_list = fp32_ops if fp32_ops is not None else wrap_list


I've checked, the changed code can produce identity wrap_list as
lists.symbol.FP16_FP32_FUNCS.

The check code is,

    print("op that in wrap_list but not in FP32_FUNCS:")
    for i in wrap_list:
        if i not in lists.symbol.FP32_FUNCS:
            print(i)

    print("op that in FP32_FUNCS but not in wrap_list:")
    for i in lists.symbol.FP32_FUNCS:
        if i not in wrap_list:
            print(i)

The output is,

op that in infered_fp32_op_list but not in FP32_FUNCS:
op that in FP32_FUNCS but not in infered_fp32_op_list:
op that in infered_fp32_op_list but not in FP32_FUNCS:
op that in FP32_FUNCS but not in infered_fp32_op_list:

This change should be able to automatically set new operator as fp32
only.

Thanks,
Zhennan

On Wed, 2019-05-29 at 19:13 +0000, Skalicky, Sam wrote:
> Thanks Przemek for the additional explanation, but I’m still confused
> on this part. I don’t understand the explanation of the optimizer’s
> interaction here. 
> 
> > > The technical reason for that is the only place from which one
> > > can get MXNet operators is MXListAllOps call, which gives back
> > > all operators without any way of differentiating them, including
> > > for example calls to optimizers. If somebody added a new
> > > optimizer and AMP inserts a FP32 cast before its input, the
> > > optimizer would actually update the FP32 copy of the tensor
> > > instead of the tensor itself. This is not a performance problem
> > > (which, I agree, would be perfectly fine to solve later) - it is
> > > a correctness problem.
> > > That is why the error message in the test explicitly mentions
> > > this case as one where the operator should be put in the list
> > > that does not make any casts.
> 
> 
> If this is third list that Anirudh mentions:
> 
> > > 3. Not sure, Don't know or dont care: FP32 list.
> 
> 
> and this list simply contains all operators that have not been
> categorized, why do we need the list at all? 
> 
> Thanks,
> Sam
> 
> 
> > On May 29, 2019, at 11:58 AM, Sheng Zha <zhash...@apache.org>
> > wrote:
> > 
> > > The second misunderstanding is that the ask from the test is to
> > > somehow "add support" for AMP in the operator. That is definitely
> > > not true and adding a single line in either FP32_FUNCS (cast me
> > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me
> > > because I'm not relevant for this usecase) is perfectly fine.
> > 
> > I respectfully disagree with this statement. In this case, the
> > "support to add" is exactly to differentiate how the new operator
> > should be treated. It was not there before, so it's new. And
> > requiring other developers to add any line is totally not fine.
> > 
> > > The test ensures that the list is updated at the same time as new
> > > operator is introduced.
> > 
> > It would indeed have that effect. The problem is that the test
> > requires the wrong person to add such support. A developer who adds
> > a new operator should be under no obligation to do anything with
> > AMP.
> > 
> > Having it as warning both in runtime and in the test has the effect
> > to prompt people who actually use and care about the performance
> > from AMP to take the steps and decide where a new operator fits.
> > And like you said, "It is up to people like me actively working on
> > better FP16 support to go through the FP32 list to introduce
> > support and then move such operator to another list." This is also
> > exactly how it should work with AMP.
> > 
> > -sz
> > 
> > On 2019/05/29 18:25:46, Przemys��aw Tr��dak <ptre...@apache.org>
> > wrote: 
> > > Thank you all for responding.
> > > 
> > > Let me address a few misunderstandings about AMP in the
> > > discussion so far:
> > > - I think the main misunderstanding in the discussion so far is
> > > that it would be somehow possible to implicitly cast to FP32 all
> > > previously unseen operators. I would much prefer such solution
> > > myself, however this is actually (unfortunately) not really
> > > possible. The technical reason for that is the only place from
> > > which one can get MXNet operators is MXListAllOps call, which
> > > gives back all operators without any way of differentiating them,
> > > including for example calls to optimizers. If somebody added a
> > > new optimizer and AMP inserts a FP32 cast before its input, the
> > > optimizer would actually update the FP32 copy of the tensor
> > > instead of the tensor itself. This is not a performance problem
> > > (which, I agree, would be perfectly fine to solve later) - it is
> > > a correctness problem. That is why the error message in the test
> > > explicitly mentions this case as one where the operator should be
> > > put in the list that does not make any casts.
> > > - The second misunderstanding is that the ask from the test is to
> > > somehow "add support" for AMP in the operator. That is definitely
> > > not true and adding a single line in either FP32_FUNCS (cast me
> > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me
> > > because I'm not relevant for this usecase) is perfectly fine. It
> > > is up to people like me actively working on better FP16 support
> > > to go through the FP32 list to introduce support and then move
> > > such operator to another list.
> > > - The third (although minor) misunderstanding is that this
> > > feature is completely GPU-specific. It is not, and actually the
> > > review comment that prompted me to introduce the design as is
> > > currently implemented comes from MKLDNN developer :-).
> > > 
> > > About the proposal to move it from Python list into a operator
> > > property - I am in favor of it and, as Anirudh mentioned, this
> > > should make it much easier when just adding aliases to the
> > > operators. But again, this could not be an optional parameter
> > > unfortunately since automatic "put me in FP32 list" is not
> > > possible.
> > > About the proposal to change the error into a warning and then
> > > rely on people interested in AMP to adding operators to AMP lists
> > > - I do not like this proposal because of user experience that it
> > > would introduce. Basically what would happen is a user who
> > > downloads nightly build of MXNet (because they needed a feature
> > > introduced after last official release) would likely find that
> > > AMP is broken in that build because it happened between updates
> > > to the AMP lists. The test ensures that the list is updated at
> > > the same time as new operator is introduced. This is a question
> > > really of how do we treat nightly releases - should the user
> > > expect for all the features to generally work in them or not? If
> > > the Community thinks that it is acceptable outcome then I am fine
> > > with the proposal and regularly updating those lists myself.
> > > 
> > > Thank you
> > > Przemek
> > > 
> > > On 2019/05/28 21:32:42, Przemys��aw Tr��dak <ptre...@apache.org>
> > > wrote: 
> > > > Dear Community,
> > > > 
> > > > One of the recently merged features of the 1.5 release, AMP
> > > > (Automatic Mixed Precision) support (PR [1], design doc [5]),
> > > > introduced a requirement that every new operator added to MXNet
> > > > would need to be present in 1 of the lists (in [2]). To make
> > > > sure that this requirement is not broken when somebody adds a
> > > > new operator and does not know about AMP's existence, a test
> > > > was added to CI ([3]).
> > > > 
> > > > A few people reached out to me (the original author of the
> > > > feature) saying this test increases a burden on a developer of
> > > > new operators and should not be an actual error, but just
> > > > warning (PR for that change [4]). That is why I would like to
> > > > present a motivation for it and discuss with the wider audience
> > > > why I feel it was necessary.
> > > > 
> > > > First, for people who do not know the details of what AMP is -
> > > > it is a solution that tries to automatically apply best
> > > > practices of training in lower precision (FP16) to user's FP32
> > > > model in order to fully utilize capabilities of modern GPUs
> > > > (and potentially other hardware in the future). It does so by
> > > > casting to lower precision inputs to operators benefitting from
> > > > it, while casting to full precision inputs of operators that
> > > > are unsafe to run in lower precision or just do not support it.
> > > > 
> > > > The first iteration of AMP kept 2 main lists of operators -
> > > > operators that are beneficial and safe to do in fp16 and
> > > > operators that need to be cast to FP32. The problem (raised in
> > > > review of the PR [6], [8]) is how to make sure that the feature
> > > > works as intended and is not inadvertently broken by somebody
> > > > adding a new operator. The failure scenario here is adding a
> > > > new operator that does not support FP16 and so should be cast
> > > > to FP32, but AMP does not know about its existence and so does
> > > > not do the casting. The solution proposed in the review was to
> > > > implicitly treat all of the unknown operators as FP32-only and
> > > > keep the list of operators that work fine in both FP16 and
> > > > FP32. This solution however does not really work, because there
> > > > are multiple operators (most notably optimizers) where
> > > > introducing additional casting of the input to FP32 would break
> > > > the operator.
> > > > 
> > > > That is why after discussion with a few members of the
> > > > community, I decided to proceed with all lists being explicit
> > > > and introducing the test that would fail when somebody added an
> > > > operator without classifying it into 1 of the categories, and
> > > > explain clearly how to do it [7]. It is not ideal solution, as
> > > > it introduces some burden on the developers who are not aware
> > > > about AMP, however in the typical case of adding at most a few
> > > > operators to MXNet the inconvenience is I think pretty minor
> > > > while important for the feature correctness going forward.
> > > > 
> > > > I would like to gather Community feedback and ideas how to
> > > > handle this situation.
> > > > 
> > > > [1] https://github.com/apache/incubator-mxnet/pull/14173
> > > > [2] 
> > > > https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/amp/lists/symbol.py
> > > > [3] 
> > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py
> > > > [4] https://github.com/apache/incubator-mxnet/pull/15085
> > > > [5] 
> > > > https://docs.google.com/document/d/1sQzMoPEwux0WXSWirY07us1POD_6Y8pLYq--b9Fvd1o/edit?usp=sharing
> > > > [6] 
> > > > https://github.com/apache/incubator-mxnet/pull/14173#discussion_r270728019
> > > > [7] 
> > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py#L62-L80
> > > > [8] 
> > > > https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341
> > > > 
> 
> 

Reply via email to