How about change the below line in amp.py:164 wrap_list = fp32_ops if fp32_ops is not None else lists.symbol.FP32_FUNCS
to be plist = ctypes.POINTER(ctypes.c_char_p)() size = ctypes.c_uint() check_call(_LIB.MXListAllOpNames(ctypes.byref(size), ctypes.byref(plist))) op_names = [] for i in range(size.value): op_names.append(py_str(plist[i])) wrap_list = [] fp16_op_list = lists.symbol.FP16_FUNCS + lists.symbol.WIDEST_TYPE_CASTS + lists.symbol.FP16_FP32_FUNCS + list(map(lambda x: x[0], lists.symbol.CONDITIONAL_FP32_FUNCS)) for op_name in op_names: if not op_name.startswith("_backward_") and not op_name.startswith("_contrib_backward_") and op_name not in fp16_op_list: wrap_list.append(op_name) wrap_list = fp32_ops if fp32_ops is not None else wrap_list I've checked, the changed code can produce identity wrap_list as lists.symbol.FP16_FP32_FUNCS. The check code is, print("op that in wrap_list but not in FP32_FUNCS:") for i in wrap_list: if i not in lists.symbol.FP32_FUNCS: print(i) print("op that in FP32_FUNCS but not in wrap_list:") for i in lists.symbol.FP32_FUNCS: if i not in wrap_list: print(i) The output is, op that in infered_fp32_op_list but not in FP32_FUNCS: op that in FP32_FUNCS but not in infered_fp32_op_list: op that in infered_fp32_op_list but not in FP32_FUNCS: op that in FP32_FUNCS but not in infered_fp32_op_list: This change should be able to automatically set new operator as fp32 only. Thanks, Zhennan On Wed, 2019-05-29 at 19:13 +0000, Skalicky, Sam wrote: > Thanks Przemek for the additional explanation, but I’m still confused > on this part. I don’t understand the explanation of the optimizer’s > interaction here. > > > > The technical reason for that is the only place from which one > > > can get MXNet operators is MXListAllOps call, which gives back > > > all operators without any way of differentiating them, including > > > for example calls to optimizers. If somebody added a new > > > optimizer and AMP inserts a FP32 cast before its input, the > > > optimizer would actually update the FP32 copy of the tensor > > > instead of the tensor itself. This is not a performance problem > > > (which, I agree, would be perfectly fine to solve later) - it is > > > a correctness problem. > > > That is why the error message in the test explicitly mentions > > > this case as one where the operator should be put in the list > > > that does not make any casts. > > > If this is third list that Anirudh mentions: > > > > 3. Not sure, Don't know or dont care: FP32 list. > > > and this list simply contains all operators that have not been > categorized, why do we need the list at all? > > Thanks, > Sam > > > > On May 29, 2019, at 11:58 AM, Sheng Zha <zhash...@apache.org> > > wrote: > > > > > The second misunderstanding is that the ask from the test is to > > > somehow "add support" for AMP in the operator. That is definitely > > > not true and adding a single line in either FP32_FUNCS (cast me > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me > > > because I'm not relevant for this usecase) is perfectly fine. > > > > I respectfully disagree with this statement. In this case, the > > "support to add" is exactly to differentiate how the new operator > > should be treated. It was not there before, so it's new. And > > requiring other developers to add any line is totally not fine. > > > > > The test ensures that the list is updated at the same time as new > > > operator is introduced. > > > > It would indeed have that effect. The problem is that the test > > requires the wrong person to add such support. A developer who adds > > a new operator should be under no obligation to do anything with > > AMP. > > > > Having it as warning both in runtime and in the test has the effect > > to prompt people who actually use and care about the performance > > from AMP to take the steps and decide where a new operator fits. > > And like you said, "It is up to people like me actively working on > > better FP16 support to go through the FP32 list to introduce > > support and then move such operator to another list." This is also > > exactly how it should work with AMP. > > > > -sz > > > > On 2019/05/29 18:25:46, Przemys��aw Tr��dak <ptre...@apache.org> > > wrote: > > > Thank you all for responding. > > > > > > Let me address a few misunderstandings about AMP in the > > > discussion so far: > > > - I think the main misunderstanding in the discussion so far is > > > that it would be somehow possible to implicitly cast to FP32 all > > > previously unseen operators. I would much prefer such solution > > > myself, however this is actually (unfortunately) not really > > > possible. The technical reason for that is the only place from > > > which one can get MXNet operators is MXListAllOps call, which > > > gives back all operators without any way of differentiating them, > > > including for example calls to optimizers. If somebody added a > > > new optimizer and AMP inserts a FP32 cast before its input, the > > > optimizer would actually update the FP32 copy of the tensor > > > instead of the tensor itself. This is not a performance problem > > > (which, I agree, would be perfectly fine to solve later) - it is > > > a correctness problem. That is why the error message in the test > > > explicitly mentions this case as one where the operator should be > > > put in the list that does not make any casts. > > > - The second misunderstanding is that the ask from the test is to > > > somehow "add support" for AMP in the operator. That is definitely > > > not true and adding a single line in either FP32_FUNCS (cast me > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me > > > because I'm not relevant for this usecase) is perfectly fine. It > > > is up to people like me actively working on better FP16 support > > > to go through the FP32 list to introduce support and then move > > > such operator to another list. > > > - The third (although minor) misunderstanding is that this > > > feature is completely GPU-specific. It is not, and actually the > > > review comment that prompted me to introduce the design as is > > > currently implemented comes from MKLDNN developer :-). > > > > > > About the proposal to move it from Python list into a operator > > > property - I am in favor of it and, as Anirudh mentioned, this > > > should make it much easier when just adding aliases to the > > > operators. But again, this could not be an optional parameter > > > unfortunately since automatic "put me in FP32 list" is not > > > possible. > > > About the proposal to change the error into a warning and then > > > rely on people interested in AMP to adding operators to AMP lists > > > - I do not like this proposal because of user experience that it > > > would introduce. Basically what would happen is a user who > > > downloads nightly build of MXNet (because they needed a feature > > > introduced after last official release) would likely find that > > > AMP is broken in that build because it happened between updates > > > to the AMP lists. The test ensures that the list is updated at > > > the same time as new operator is introduced. This is a question > > > really of how do we treat nightly releases - should the user > > > expect for all the features to generally work in them or not? If > > > the Community thinks that it is acceptable outcome then I am fine > > > with the proposal and regularly updating those lists myself. > > > > > > Thank you > > > Przemek > > > > > > On 2019/05/28 21:32:42, Przemys��aw Tr��dak <ptre...@apache.org> > > > wrote: > > > > Dear Community, > > > > > > > > One of the recently merged features of the 1.5 release, AMP > > > > (Automatic Mixed Precision) support (PR [1], design doc [5]), > > > > introduced a requirement that every new operator added to MXNet > > > > would need to be present in 1 of the lists (in [2]). To make > > > > sure that this requirement is not broken when somebody adds a > > > > new operator and does not know about AMP's existence, a test > > > > was added to CI ([3]). > > > > > > > > A few people reached out to me (the original author of the > > > > feature) saying this test increases a burden on a developer of > > > > new operators and should not be an actual error, but just > > > > warning (PR for that change [4]). That is why I would like to > > > > present a motivation for it and discuss with the wider audience > > > > why I feel it was necessary. > > > > > > > > First, for people who do not know the details of what AMP is - > > > > it is a solution that tries to automatically apply best > > > > practices of training in lower precision (FP16) to user's FP32 > > > > model in order to fully utilize capabilities of modern GPUs > > > > (and potentially other hardware in the future). It does so by > > > > casting to lower precision inputs to operators benefitting from > > > > it, while casting to full precision inputs of operators that > > > > are unsafe to run in lower precision or just do not support it. > > > > > > > > The first iteration of AMP kept 2 main lists of operators - > > > > operators that are beneficial and safe to do in fp16 and > > > > operators that need to be cast to FP32. The problem (raised in > > > > review of the PR [6], [8]) is how to make sure that the feature > > > > works as intended and is not inadvertently broken by somebody > > > > adding a new operator. The failure scenario here is adding a > > > > new operator that does not support FP16 and so should be cast > > > > to FP32, but AMP does not know about its existence and so does > > > > not do the casting. The solution proposed in the review was to > > > > implicitly treat all of the unknown operators as FP32-only and > > > > keep the list of operators that work fine in both FP16 and > > > > FP32. This solution however does not really work, because there > > > > are multiple operators (most notably optimizers) where > > > > introducing additional casting of the input to FP32 would break > > > > the operator. > > > > > > > > That is why after discussion with a few members of the > > > > community, I decided to proceed with all lists being explicit > > > > and introducing the test that would fail when somebody added an > > > > operator without classifying it into 1 of the categories, and > > > > explain clearly how to do it [7]. It is not ideal solution, as > > > > it introduces some burden on the developers who are not aware > > > > about AMP, however in the typical case of adding at most a few > > > > operators to MXNet the inconvenience is I think pretty minor > > > > while important for the feature correctness going forward. > > > > > > > > I would like to gather Community feedback and ideas how to > > > > handle this situation. > > > > > > > > [1] https://github.com/apache/incubator-mxnet/pull/14173 > > > > [2] > > > > https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/amp/lists/symbol.py > > > > [3] > > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py > > > > [4] https://github.com/apache/incubator-mxnet/pull/15085 > > > > [5] > > > > https://docs.google.com/document/d/1sQzMoPEwux0WXSWirY07us1POD_6Y8pLYq--b9Fvd1o/edit?usp=sharing > > > > [6] > > > > https://github.com/apache/incubator-mxnet/pull/14173#discussion_r270728019 > > > > [7] > > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py#L62-L80 > > > > [8] > > > > https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341 > > > > > >