FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650738036


   > I agree that the cache flush mechanism is useful in getting preciser 
measurement. It would be great if @FrozenGene can provide some experimental 
data to further assure.
   > 
   > I vote for folding cache flushing factor into time_evaluator for 
succinctness. And making it more configurable and generic sounds good to me.
   
   @yidawang  Previous experimental data is almost based on Ansor, especially 
x86 winograd. Like winograd of 1x7x7x512x512, the single op tuning performance 
time could reach in 0.11ms (on one skylake 512 machine), but when to execute on 
e2e, this op even could cost several ms (sorry I lost this number, only record 
0.11ms). The issue is the const matrix and weight (for example, 3x3x513x512 
will become 6x6x512x512 if tile size is 4).
   
   Another benefit to add clflush is we needn't min_repeat_ms (like 1000) 
because we could measure it very precisely. Like this pr, we even only set 
repeat to be 10. So we could reduce our tuning time.
   
   I am collecting auto tvm resnet18 data on one skylake machine and will share 
it when it completes ASAP.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to