Junru Shao created MXNET-1417:
---------------------------------
Summary: [Performance] Caching Dynamic Shape Checking Result
Key: MXNET-1417
URL: https://issues.apache.org/jira/browse/MXNET-1417
Project: Apache MXNet
Issue Type: Improvement
Reporter: Junru Shao
h2. Description
(Please see appendix for experiment details)
PR [#1324|https://github.com/apache/incubator-mxnet/issues/1324] that enables
dynamic shapes slows down a model that originally runs in 235.65 ms by 7.26 ms
(to 242.91 ms).
Also noted that a seemingly relevant PR
[#14665|https://github.com/apache/incubator-mxnet/pull/14665] suggesting itself
to be improving "[performance]", does not change performance number in any
means - It still runs in 242.35 ms.
This PR fixes this by caching the checking result of whether dynamic shape
exists. The mechanism itself is quick simple: if the dynamic shape existence
has been checked, let's simply don't do it again, because the graph does not
change.
h2. Checklist
h3. Essentials
Please feel free to remove inapplicable items for your PR.
* The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the
relevant [JIRA issue|https://issues.apache.org/jira/projects/MXNET/issues]
created (except PRs with tiny changes)
* Changes are complete (i.e. I finished coding on this PR)
* All changes have test coverage:
* Unit tests are added for small changes to verify correctness (e.g. adding a
new operator)
* Nightly tests are added for complicated/long-running ones (e.g. changing
distributed kvstore)
* Build tests will be added for build configuration changes (e.g. adding a new
build option with NCCL)
* Code is well-documented:
* For user-facing API changes, API doc string has been updated.
* For new C++ functions in header files, their functionalities and arguments
are documented.
* For new examples, README.md is added to explain the what the example does,
the source of the dataset, expected performance on test set and reference to
the original paper if applicable
* Check the API doc at
[http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html]
* To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
h3. Changes
Nothing
h2. Comments
Experiment environment: EC2 p2.8xlarge, CUDA 10 and cuDNN 7.5. The model itself
is confidential.
The detailed benchmark is as below (mean ± stdev). The experiment is conducted
in 20 runs, warmup run is excluded.
# On commit
[{{39412b3}}|https://github.com/apache/incubator-mxnet/commit/39412b37ffca84bf3cd10f81dac5c6c77149f3ac]
(right before PR [#14192|https://github.com/apache/incubator-mxnet/pull/14192]
is merge):
Hybridize w/ static_alloc: 235.65 ± 0.22246 ms
# On commit
[{{83d2c2d}}|https://github.com/apache/incubator-mxnet/commit/83d2c2d0e0edeb7d85471437601efcf8bebf070e]
(where PR [#14192|https://github.com/apache/incubator-mxnet/pull/14192] is
merged):
Hybridize w/ static_alloc: 242.91 ms ± 0.71125 ms
# PR [#14665|https://github.com/apache/incubator-mxnet/pull/14665] patched to
commit
[{{83d2c2d}}|https://github.com/apache/incubator-mxnet/commit/83d2c2d0e0edeb7d85471437601efcf8bebf070e]
Hybridize w/ static_alloc: 242.35 ± 0.25124 ms
# After this patch applied to commit
[{{83d2c2d}}|https://github.com/apache/incubator-mxnet/commit/83d2c2d0e0edeb7d85471437601efcf8bebf070e]
Hybridize w/ static_alloc: 234.95 ± 0.39334 ms
CC: [@szha|https://github.com/szha] [@zheng-da|https://github.com/zheng-da]
please review :-)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]