Given that we're swtiching from the practice of failing the AMP related test to warning, I intend to merge #15085 soon if no objection.
-sz On 2019/05/30 19:07:07, Przemys��aw Tr��dak <[email protected]> wrote: > Hi Sam and Zhennan, > > the problem is not how to implicitly produce a list of all operators not in > any other list - that is easy and the code Zhennan provided would work. The > problem is that such list would not be actually correct in all cases - you do > NOT want optimizers to land in FP32_FUNCS list, because then the tensor being > updated could potentially be the FP32 copy and not the actual tensor you care > about. That is why the first bullet point in the error message in the test is > along the lines of: "if you do an optimizer or anything else that does not go > in the computational graph, put it in the FP16_FP32_FUNCS list" - that list > does not do any wrapping. > > HOWEVER, as I was writing this reply I realized that due to pure luck this is > not actually what happens - optimizers could in fact be in the FP32_FUNCS > list. That is because, as AMP's assumption is that the model being changed is > FP32 model at the start, all weights (and so all the gradients just before > application as well) are stored in FP32 already (and so the casting would not > actually occur). It would therefore be possible to make the FP32 list to be > implicit (although relying on this lucky coincidence makes me slightly > uncomfortable). > > Przemek > > > > On 2019/05/30 08:04:05, "Qin, Zhennan" <[email protected]> wrote: > > How about change the below line in amp.py:164 > > > > wrap_list = fp32_ops if fp32_ops is not None else > > lists.symbol.FP32_FUNCS > > > > to be > > > > plist = ctypes.POINTER(ctypes.c_char_p)() > > size = ctypes.c_uint() > > > > check_call(_LIB.MXListAllOpNames(ctypes.byref(size), > > ctypes.byref(plist))) > > op_names = [] > > for i in range(size.value): > > op_names.append(py_str(plist[i])) > > > > wrap_list = [] > > fp16_op_list = lists.symbol.FP16_FUNCS + > > lists.symbol.WIDEST_TYPE_CASTS + lists.symbol.FP16_FP32_FUNCS + > > list(map(lambda x: x[0], lists.symbol.CONDITIONAL_FP32_FUNCS)) > > for op_name in op_names: > > if not op_name.startswith("_backward_") and not > > op_name.startswith("_contrib_backward_") and op_name not in > > fp16_op_list: > > wrap_list.append(op_name) > > wrap_list = fp32_ops if fp32_ops is not None else wrap_list > > > > > > I've checked, the changed code can produce identity wrap_list as > > lists.symbol.FP16_FP32_FUNCS. > > > > The check code is, > > > > print("op that in wrap_list but not in FP32_FUNCS:") > > for i in wrap_list: > > if i not in lists.symbol.FP32_FUNCS: > > print(i) > > > > print("op that in FP32_FUNCS but not in wrap_list:") > > for i in lists.symbol.FP32_FUNCS: > > if i not in wrap_list: > > print(i) > > > > The output is, > > > > op that in infered_fp32_op_list but not in FP32_FUNCS: > > op that in FP32_FUNCS but not in infered_fp32_op_list: > > op that in infered_fp32_op_list but not in FP32_FUNCS: > > op that in FP32_FUNCS but not in infered_fp32_op_list: > > > > This change should be able to automatically set new operator as fp32 > > only. > > > > Thanks, > > Zhennan > > > > On Wed, 2019-05-29 at 19:13 +0000, Skalicky, Sam wrote: > > > Thanks Przemek for the additional explanation, but I’m still confused > > > on this part. I don’t understand the explanation of the optimizer’s > > > interaction here. > > > > > > > > The technical reason for that is the only place from which one > > > > > can get MXNet operators is MXListAllOps call, which gives back > > > > > all operators without any way of differentiating them, including > > > > > for example calls to optimizers. If somebody added a new > > > > > optimizer and AMP inserts a FP32 cast before its input, the > > > > > optimizer would actually update the FP32 copy of the tensor > > > > > instead of the tensor itself. This is not a performance problem > > > > > (which, I agree, would be perfectly fine to solve later) - it is > > > > > a correctness problem. > > > > > That is why the error message in the test explicitly mentions > > > > > this case as one where the operator should be put in the list > > > > > that does not make any casts. > > > > > > > > > If this is third list that Anirudh mentions: > > > > > > > > 3. Not sure, Don't know or dont care: FP32 list. > > > > > > > > > and this list simply contains all operators that have not been > > > categorized, why do we need the list at all? > > > > > > Thanks, > > > Sam > > > > > > > > > > On May 29, 2019, at 11:58 AM, Sheng Zha <[email protected]> > > > > wrote: > > > > > > > > > The second misunderstanding is that the ask from the test is to > > > > > somehow "add support" for AMP in the operator. That is definitely > > > > > not true and adding a single line in either FP32_FUNCS (cast me > > > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me > > > > > because I'm not relevant for this usecase) is perfectly fine. > > > > > > > > I respectfully disagree with this statement. In this case, the > > > > "support to add" is exactly to differentiate how the new operator > > > > should be treated. It was not there before, so it's new. And > > > > requiring other developers to add any line is totally not fine. > > > > > > > > > The test ensures that the list is updated at the same time as new > > > > > operator is introduced. > > > > > > > > It would indeed have that effect. The problem is that the test > > > > requires the wrong person to add such support. A developer who adds > > > > a new operator should be under no obligation to do anything with > > > > AMP. > > > > > > > > Having it as warning both in runtime and in the test has the effect > > > > to prompt people who actually use and care about the performance > > > > from AMP to take the steps and decide where a new operator fits. > > > > And like you said, "It is up to people like me actively working on > > > > better FP16 support to go through the FP32 list to introduce > > > > support and then move such operator to another list." This is also > > > > exactly how it should work with AMP. > > > > > > > > -sz > > > > > > > > On 2019/05/29 18:25:46, Przemys��aw Tr��dak <[email protected]> > > > > wrote: > > > > > Thank you all for responding. > > > > > > > > > > Let me address a few misunderstandings about AMP in the > > > > > discussion so far: > > > > > - I think the main misunderstanding in the discussion so far is > > > > > that it would be somehow possible to implicitly cast to FP32 all > > > > > previously unseen operators. I would much prefer such solution > > > > > myself, however this is actually (unfortunately) not really > > > > > possible. The technical reason for that is the only place from > > > > > which one can get MXNet operators is MXListAllOps call, which > > > > > gives back all operators without any way of differentiating them, > > > > > including for example calls to optimizers. If somebody added a > > > > > new optimizer and AMP inserts a FP32 cast before its input, the > > > > > optimizer would actually update the FP32 copy of the tensor > > > > > instead of the tensor itself. This is not a performance problem > > > > > (which, I agree, would be perfectly fine to solve later) - it is > > > > > a correctness problem. That is why the error message in the test > > > > > explicitly mentions this case as one where the operator should be > > > > > put in the list that does not make any casts. > > > > > - The second misunderstanding is that the ask from the test is to > > > > > somehow "add support" for AMP in the operator. That is definitely > > > > > not true and adding a single line in either FP32_FUNCS (cast me > > > > > always to FP32) or FP16_FP32_FUNCS (do not do anything with me > > > > > because I'm not relevant for this usecase) is perfectly fine. It > > > > > is up to people like me actively working on better FP16 support > > > > > to go through the FP32 list to introduce support and then move > > > > > such operator to another list. > > > > > - The third (although minor) misunderstanding is that this > > > > > feature is completely GPU-specific. It is not, and actually the > > > > > review comment that prompted me to introduce the design as is > > > > > currently implemented comes from MKLDNN developer :-). > > > > > > > > > > About the proposal to move it from Python list into a operator > > > > > property - I am in favor of it and, as Anirudh mentioned, this > > > > > should make it much easier when just adding aliases to the > > > > > operators. But again, this could not be an optional parameter > > > > > unfortunately since automatic "put me in FP32 list" is not > > > > > possible. > > > > > About the proposal to change the error into a warning and then > > > > > rely on people interested in AMP to adding operators to AMP lists > > > > > - I do not like this proposal because of user experience that it > > > > > would introduce. Basically what would happen is a user who > > > > > downloads nightly build of MXNet (because they needed a feature > > > > > introduced after last official release) would likely find that > > > > > AMP is broken in that build because it happened between updates > > > > > to the AMP lists. The test ensures that the list is updated at > > > > > the same time as new operator is introduced. This is a question > > > > > really of how do we treat nightly releases - should the user > > > > > expect for all the features to generally work in them or not? If > > > > > the Community thinks that it is acceptable outcome then I am fine > > > > > with the proposal and regularly updating those lists myself. > > > > > > > > > > Thank you > > > > > Przemek > > > > > > > > > > On 2019/05/28 21:32:42, Przemys��aw Tr��dak <[email protected]> > > > > > wrote: > > > > > > Dear Community, > > > > > > > > > > > > One of the recently merged features of the 1.5 release, AMP > > > > > > (Automatic Mixed Precision) support (PR [1], design doc [5]), > > > > > > introduced a requirement that every new operator added to MXNet > > > > > > would need to be present in 1 of the lists (in [2]). To make > > > > > > sure that this requirement is not broken when somebody adds a > > > > > > new operator and does not know about AMP's existence, a test > > > > > > was added to CI ([3]). > > > > > > > > > > > > A few people reached out to me (the original author of the > > > > > > feature) saying this test increases a burden on a developer of > > > > > > new operators and should not be an actual error, but just > > > > > > warning (PR for that change [4]). That is why I would like to > > > > > > present a motivation for it and discuss with the wider audience > > > > > > why I feel it was necessary. > > > > > > > > > > > > First, for people who do not know the details of what AMP is - > > > > > > it is a solution that tries to automatically apply best > > > > > > practices of training in lower precision (FP16) to user's FP32 > > > > > > model in order to fully utilize capabilities of modern GPUs > > > > > > (and potentially other hardware in the future). It does so by > > > > > > casting to lower precision inputs to operators benefitting from > > > > > > it, while casting to full precision inputs of operators that > > > > > > are unsafe to run in lower precision or just do not support it. > > > > > > > > > > > > The first iteration of AMP kept 2 main lists of operators - > > > > > > operators that are beneficial and safe to do in fp16 and > > > > > > operators that need to be cast to FP32. The problem (raised in > > > > > > review of the PR [6], [8]) is how to make sure that the feature > > > > > > works as intended and is not inadvertently broken by somebody > > > > > > adding a new operator. The failure scenario here is adding a > > > > > > new operator that does not support FP16 and so should be cast > > > > > > to FP32, but AMP does not know about its existence and so does > > > > > > not do the casting. The solution proposed in the review was to > > > > > > implicitly treat all of the unknown operators as FP32-only and > > > > > > keep the list of operators that work fine in both FP16 and > > > > > > FP32. This solution however does not really work, because there > > > > > > are multiple operators (most notably optimizers) where > > > > > > introducing additional casting of the input to FP32 would break > > > > > > the operator. > > > > > > > > > > > > That is why after discussion with a few members of the > > > > > > community, I decided to proceed with all lists being explicit > > > > > > and introducing the test that would fail when somebody added an > > > > > > operator without classifying it into 1 of the categories, and > > > > > > explain clearly how to do it [7]. It is not ideal solution, as > > > > > > it introduces some burden on the developers who are not aware > > > > > > about AMP, however in the typical case of adding at most a few > > > > > > operators to MXNet the inconvenience is I think pretty minor > > > > > > while important for the feature correctness going forward. > > > > > > > > > > > > I would like to gather Community feedback and ideas how to > > > > > > handle this situation. > > > > > > > > > > > > [1] https://github.com/apache/incubator-mxnet/pull/14173 > > > > > > [2] > > > > > > https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/amp/lists/symbol.py > > > > > > [3] > > > > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py > > > > > > [4] https://github.com/apache/incubator-mxnet/pull/15085 > > > > > > [5] > > > > > > https://docs.google.com/document/d/1sQzMoPEwux0WXSWirY07us1POD_6Y8pLYq--b9Fvd1o/edit?usp=sharing > > > > > > [6] > > > > > > https://github.com/apache/incubator-mxnet/pull/14173#discussion_r270728019 > > > > > > [7] > > > > > > https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py#L62-L80 > > > > > > [8] > > > > > > https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341 > > > > > > > > > > > > > > >
