mbrookhart opened a new pull request #5755: URL: https://github.com/apache/incubator-tvm/pull/5755
This makes infer_value and infer_value_simulated run one op at a time to prevent recompiling/rerunning the same graph over and over. If the old version ran in `t = A * N * M`, where N is the number of nodes in the call, and M is the number of times infer_value is called, this runs in `t = B * N`, but `B >> A`. This means some models slow down during import, but very long running models speed up. Import performance testing on my laptop with models from the onnx model zoo and HuggingFace RN50 (https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v1/resnet50v1.tar.gz) master: 5.05 s this: 6.05 s BERT-Squad (https://github.com/onnx/models/raw/master/text/machine_comprehension/bert-squad/model/bertsquad-8.tar.gz): master: 68.8 s this: 158.5 s HuggingFace BERT (`transformers.TFBertForSequenceClassification.from_pretrained('bert-base-cased')`) master: 567.1 s this: 122.5 s @jwfromm @masahi Do you think this is worth including? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
