Thanks Przemek for the additional explanation, but I’m still confused on this part. I don’t understand the explanation of the optimizer’s interaction here.
>> The technical reason for that is the only place from which one can get MXNet >> operators is MXListAllOps call, which gives back all operators without any >> way of differentiating them, including for example calls to optimizers. If >> somebody added a new optimizer and AMP inserts a FP32 cast before its input, >> the optimizer would actually update the FP32 copy of the tensor instead of >> the tensor itself. This is not a performance problem (which, I agree, would >> be perfectly fine to solve later) - it is a correctness problem. >> That is why the error message in the test explicitly mentions this case as >> one where the operator should be put in the list that does not make any >> casts. If this is third list that Anirudh mentions: >> 3. Not sure, Don't know or dont care: FP32 list. and this list simply contains all operators that have not been categorized, why do we need the list at all? Thanks, Sam > On May 29, 2019, at 11:58 AM, Sheng Zha <[email protected]> wrote: > >> The second misunderstanding is that the ask from the test is to somehow "add >> support" for AMP in the operator. That is definitely not true and adding a >> single line in either FP32_FUNCS (cast me always to FP32) or FP16_FP32_FUNCS >> (do not do anything with me because I'm not relevant for this usecase) is >> perfectly fine. > > I respectfully disagree with this statement. In this case, the "support to > add" is exactly to differentiate how the new operator should be treated. It > was not there before, so it's new. And requiring other developers to add any > line is totally not fine. > >> The test ensures that the list is updated at the same time as new operator >> is introduced. > > It would indeed have that effect. The problem is that the test requires the > wrong person to add such support. A developer who adds a new operator should > be under no obligation to do anything with AMP. > > Having it as warning both in runtime and in the test has the effect to prompt > people who actually use and care about the performance from AMP to take the > steps and decide where a new operator fits. And like you said, "It is up to > people like me actively working on better FP16 support to go through the FP32 > list to introduce support and then move such operator to another list." This > is also exactly how it should work with AMP. > > -sz > > On 2019/05/29 18:25:46, Przemys��aw Tr��dak <[email protected]> wrote: >> Thank you all for responding. >> >> Let me address a few misunderstandings about AMP in the discussion so far: >> - I think the main misunderstanding in the discussion so far is that it >> would be somehow possible to implicitly cast to FP32 all previously unseen >> operators. I would much prefer such solution myself, however this is >> actually (unfortunately) not really possible. The technical reason for that >> is the only place from which one can get MXNet operators is MXListAllOps >> call, which gives back all operators without any way of differentiating >> them, including for example calls to optimizers. If somebody added a new >> optimizer and AMP inserts a FP32 cast before its input, the optimizer would >> actually update the FP32 copy of the tensor instead of the tensor itself. >> This is not a performance problem (which, I agree, would be perfectly fine >> to solve later) - it is a correctness problem. That is why the error message >> in the test explicitly mentions this case as one where the operator should >> be put in the list that does not make any casts. >> - The second misunderstanding is that the ask from the test is to somehow >> "add support" for AMP in the operator. That is definitely not true and >> adding a single line in either FP32_FUNCS (cast me always to FP32) or >> FP16_FP32_FUNCS (do not do anything with me because I'm not relevant for >> this usecase) is perfectly fine. It is up to people like me actively working >> on better FP16 support to go through the FP32 list to introduce support and >> then move such operator to another list. >> - The third (although minor) misunderstanding is that this feature is >> completely GPU-specific. It is not, and actually the review comment that >> prompted me to introduce the design as is currently implemented comes from >> MKLDNN developer :-). >> >> About the proposal to move it from Python list into a operator property - I >> am in favor of it and, as Anirudh mentioned, this should make it much easier >> when just adding aliases to the operators. But again, this could not be an >> optional parameter unfortunately since automatic "put me in FP32 list" is >> not possible. >> About the proposal to change the error into a warning and then rely on >> people interested in AMP to adding operators to AMP lists - I do not like >> this proposal because of user experience that it would introduce. Basically >> what would happen is a user who downloads nightly build of MXNet (because >> they needed a feature introduced after last official release) would likely >> find that AMP is broken in that build because it happened between updates to >> the AMP lists. The test ensures that the list is updated at the same time as >> new operator is introduced. This is a question really of how do we treat >> nightly releases - should the user expect for all the features to generally >> work in them or not? If the Community thinks that it is acceptable outcome >> then I am fine with the proposal and regularly updating those lists myself. >> >> Thank you >> Przemek >> >> On 2019/05/28 21:32:42, Przemys��aw Tr��dak <[email protected]> wrote: >>> Dear Community, >>> >>> One of the recently merged features of the 1.5 release, AMP (Automatic >>> Mixed Precision) support (PR [1], design doc [5]), introduced a requirement >>> that every new operator added to MXNet would need to be present in 1 of the >>> lists (in [2]). To make sure that this requirement is not broken when >>> somebody adds a new operator and does not know about AMP's existence, a >>> test was added to CI ([3]). >>> >>> A few people reached out to me (the original author of the feature) saying >>> this test increases a burden on a developer of new operators and should not >>> be an actual error, but just warning (PR for that change [4]). That is why >>> I would like to present a motivation for it and discuss with the wider >>> audience why I feel it was necessary. >>> >>> First, for people who do not know the details of what AMP is - it is a >>> solution that tries to automatically apply best practices of training in >>> lower precision (FP16) to user's FP32 model in order to fully utilize >>> capabilities of modern GPUs (and potentially other hardware in the future). >>> It does so by casting to lower precision inputs to operators benefitting >>> from it, while casting to full precision inputs of operators that are >>> unsafe to run in lower precision or just do not support it. >>> >>> The first iteration of AMP kept 2 main lists of operators - operators that >>> are beneficial and safe to do in fp16 and operators that need to be cast to >>> FP32. The problem (raised in review of the PR [6], [8]) is how to make sure >>> that the feature works as intended and is not inadvertently broken by >>> somebody adding a new operator. The failure scenario here is adding a new >>> operator that does not support FP16 and so should be cast to FP32, but AMP >>> does not know about its existence and so does not do the casting. The >>> solution proposed in the review was to implicitly treat all of the unknown >>> operators as FP32-only and keep the list of operators that work fine in >>> both FP16 and FP32. This solution however does not really work, because >>> there are multiple operators (most notably optimizers) where introducing >>> additional casting of the input to FP32 would break the operator. >>> >>> That is why after discussion with a few members of the community, I decided >>> to proceed with all lists being explicit and introducing the test that >>> would fail when somebody added an operator without classifying it into 1 of >>> the categories, and explain clearly how to do it [7]. It is not ideal >>> solution, as it introduces some burden on the developers who are not aware >>> about AMP, however in the typical case of adding at most a few operators to >>> MXNet the inconvenience is I think pretty minor while important for the >>> feature correctness going forward. >>> >>> I would like to gather Community feedback and ideas how to handle this >>> situation. >>> >>> [1] https://github.com/apache/incubator-mxnet/pull/14173 >>> [2] >>> https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/amp/lists/symbol.py >>> [3] >>> https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py >>> [4] https://github.com/apache/incubator-mxnet/pull/15085 >>> [5] >>> https://docs.google.com/document/d/1sQzMoPEwux0WXSWirY07us1POD_6Y8pLYq--b9Fvd1o/edit?usp=sharing >>> [6] >>> https://github.com/apache/incubator-mxnet/pull/14173#discussion_r270728019 >>> [7] >>> https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_amp.py#L62-L80 >>> [8] >>> https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341 >>> >>
