billishyahao commented on PR #11513:
URL: https://github.com/apache/tvm/pull/11513#issuecomment-1150858975
> One important comment about performance. Just to point out.
>
> In this patch you are using mechanic of auto detection proper layout
inside of dnnl_json_runtime. It works correctly and dense primitive will use
optimal layout. But it will execute weight reordering each inference call. This
reordering significantly break performance (still better than previously, but
less than possible).
>
> To avoid weight reordering it should be done once during `Init`. For that
you need change dense weight pattern from `wildcard` to `is_constant`.
Hi @apeskov , the following is a clip of dnnl verbose log:
`onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0400391
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc1024,0.0717773
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0351562
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_
core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0
bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,attr-scratchpad:user
attr-post-ops:eltwise_gelu_erf ,,mb49ic512oc2048,0.215088
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic2048oc512,0.227051
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0339355
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::
blocked:AB16b64a:f0 bia_undef::undef::f0
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc1024,0.072998
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0
dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0349121
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0
dst_f32::blocked:ab:f0,attr-scratchpad:user attr-post-ops:eltwise_gelu_erf
,,mb49ic512oc2048,0.226807
onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0
wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::bloc
ked:ab:f0,attr-scratchpad:user ,,mb49ic2048oc512,0.231934 `
I don't observe the reorder primitive executed before or after
inner_product. I think current mechanism still work?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]