elvin-n edited a comment on pull request #8381:
URL: https://github.com/apache/tvm/pull/8381#issuecomment-872864628


   @masahi 
   I observed much better performance on Iris XE graphics with intel graphics 
than with generic gpu. 
   On ONNX Mobilenet v2 Generic OpenCL auto scheduler - 14ms, with AUTOTVM IG 
conv2d_NCHWc 16ms, with autoscheduler conv_packed_data 9ms, after some fixes in 
IG conv2d_NCHWc and autoschewduler - 4ms, I have not created PR yet since have 
some questions to IG primitives and wanted to verify everything on previous 
generation of Intel Graphics.
   Eventually I was ably to find Intel graphics (UHD graphics 630) and the 
situation was much less optimistic, still continue evaluations.
   1. Intel graphics introduce two convolution - NCHWc and packed_data(by fact 
it is blocked convolution by out channels with further mandatory conversion to 
planar layout). Both of these convolutions had accuracy issue, I fixed it with 
only for packed_data and !PR8201 was accepted, but have not fixed accuracy of 
NCHWc yet
   2. The compute for IG conv2d_NCHWc assumes different flows depending on 
parameters for scheduling selected by AUTOTVM. it sometimes can add repack by 
spacial dimentions, sometimes do not add. and AutoTVM using default tuner 
aborts tuning with claiming of different number of features. This is another 
reason why I have not created PR yet - NCHWc should be significantly modified 
and I wanted to make sure that I do not break anything valuable
   3. I would say that in many cases NCHWc kernels themselves are better than 
generic OpenCL, but often additional conversion of data into NCHWc<-> NCHW eats 
all performance boost and even make topology work slower
   4. I have not finished comparison on UHD630. On the same time I see 2x 
slowdown on this platform comparing to competitors. Wanted to take a look into 
this as well.
   5. Probably it would be more correct way to add `model` distinguishing UHD 
and IrisXE in addition to the `-device=intel_graphics`, but there are several 
independent places in TVM dealing with device/model and it's hard for me to 
figure out the right scope of changes for adding model
   
   Several more questions to you
   1. which topologies have you used for your experiment with Skylake graphics? 
   2. can you try to tune with `target="opencl -device=intel_graphics"`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to