[GitHub] [tvm] billishyahao commented on pull request #11513: [BYOC][DNNL] Improve performance of DNNL BYOC dense operator

GitBox Thu, 09 Jun 2022 01:59:32 -0700


billishyahao commented on PR #11513:
URL: https://github.com/apache/tvm/pull/11513#issuecomment-1150858975


   > One important comment about performance. Just to point out.
   > 
   > In this patch you are using mechanic of auto detection proper layout 
inside of dnnl_json_runtime. It works correctly and dense primitive will use 
optimal layout. But it will execute weight reordering each inference call. This 
reordering significantly break performance (still better than previously, but 
less than possible).
   > 
   > To avoid weight reordering it should be done once during `Init`. For that 
you need change dense weight pattern from `wildcard` to `is_constant`.
   
   Hi @apeskov , the following is a clip of dnnl verbose log:
   
   
`onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0400391          
                                                                                
      
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc1024,0.0717773         
                                                                                
      
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0351562          
                                                                                
     onednn_verbose,exec,cpu,inner_product,brgemm:avx512_
 core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 
bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,attr-scratchpad:user 
attr-post-ops:eltwise_gelu_erf ,,mb49ic512oc2048,0.215088                       
                                         
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic2048oc512,0.227051          
                                                                                
     
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0339355          
                                                                                
      
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::
 blocked:AB16b64a:f0 bia_undef::undef::f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc1024,0.072998          
                                                                                
      
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0349121          
                                                                                
     
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:ab:f0,attr-scratchpad:user attr-post-ops:eltwise_gelu_erf 
,,mb49ic512oc2048,0.226807                                                      
          
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::bloc
 ked:ab:f0,attr-scratchpad:user ,,mb49ic2048oc512,0.231934 `
   
   I don't observe the reorder primitive executed before or after 
inner_product. I think current mechanism still work?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] billishyahao commented on pull request #11513: [BYOC][DNNL] Improve performance of DNNL BYOC dense operator

Reply via email to