icemelon9 edited a comment on issue #4324: [Relay] Use opaque for where op
URL: https://github.com/apache/incubator-tvm/pull/4324#issuecomment-557727463
 
 
   After some benchmark, I do see there's slightly improvement on the CPU 
instances. On C5.9xl, with where being opaque, the latency of BERT is 34.51ms 
(std: 0.59ms), while with where being broadcast, the latency is 36.06ms (std: 
0.50ms). The difference is about 1.5ms or 4%. But the difference on C5.4xl is 
smaller, opaque 46.99ms (std: 0.13ms) vs broadcast 47.87ms (std: 0.24ms), 1.8% 
improvement. I also evaluate this on GPU instance P3.2xl, which I didn't see 
much difference in performance.
   
   Given the improvement is not very signficant on CPU, I think let's don't 
make this change for now. In the future, we can potentially have a performance 
based op fusion pass which it can determin what kind of fusion gives the best 
performance.
   
   @tqchen @kevinthesun thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to