mbrookhart opened a new pull request #5755:
URL: https://github.com/apache/incubator-tvm/pull/5755


   This makes infer_value and infer_value_simulated run one op at a time to 
prevent recompiling/rerunning the same graph over and over.
   
   If the old version ran in `t = A * N * M`, where N is the number of nodes in 
the call, and M is the number of times infer_value is called, this runs in `t = 
B * N`, but `B >> A`. This means some models slow down during import, but very 
long running models speed up.
   
   Import performance testing on my laptop with  models from the onnx model zoo 
and HuggingFace
   
   RN50 
(https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v1/resnet50v1.tar.gz)
     master: 5.05 s
     this: 6.05 s
   BERT-Squad 
(https://github.com/onnx/models/raw/master/text/machine_comprehension/bert-squad/model/bertsquad-8.tar.gz):
     master: 68.8 s
     this: 158.5 s
   HuggingFace BERT 
(`transformers.TFBertForSequenceClassification.from_pretrained('bert-base-cased')`)
     master: 567.1 s
     this: 122.5 s
   
   @jwfromm @masahi Do you think this is worth including?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to