kpuatamazon commented on pull request #19562:
URL: https://github.com/apache/incubator-mxnet/pull/19562#issuecomment-744892721
I made the sizing more systematic.
AVX512 means
`__attribute__((target("avx512f,avx512bw,avx512cd,avx512dq,avx512vnni")))`
Inverse means the 1.f/std is computed in advance rather than dividing by std
in the loop.
Overall, the Marian implementation seems to win on smaller problem sizes,
including the x512 sizes from @fhieber but lose on larger problem sizes.
Of course there are edge cases when the width is not a multiple of 16 and
gcc is testing for those edge cases every time, so I see how that could be
optimized.
| Shape | Marian | + AVX512 | + AVX512 + inverse | oneDNN |
|--|--|--|--|--|
| 1x 3 | 0.0000363| 0.0000370| 0.0000364| 0.0000453|
| 5x 3 | 0.0000337| 0.0000355| 0.0000348| 0.0000426|
| 10x 3 | 0.0000346| 0.0000352| 0.0000344| 0.0000434|
| 20x 3 | 0.0000337| 0.0000342| 0.0000349| 0.0000438|
| 30x 3 | 0.0000354| 0.0000376| 0.0000371| 0.0000434|
| 40x 3 | 0.0000373| 0.0000396| 0.0000382| 0.0000431|
| 50x 3 | 0.0000381| 0.0000405| 0.0000393| 0.0000436|
| 60x 3 | 0.0000390| 0.0000408| 0.0000403| 0.0000447|
| 70x 3 | 0.0000391| 0.0000376| 0.0000411| 0.0000440|
| 80x 3 | 0.0000403| 0.0000400| 0.0000414| 0.0000438|
| 90x 3 | 0.0000399| 0.0000399| 0.0000392| 0.0000446|
| 100x 3 | 0.0000378| 0.0000409| 0.0000399| 0.0000451|
| 110x 3 | 0.0000385| 0.0000414| 0.0000398| 0.0000446|
| 120x 3 | 0.0000390| 0.0000413| 0.0000417| 0.0000454|
| 130x 3 | 0.0000389| 0.0000429| 0.0000420| 0.0000457|
| 140x 3 | 0.0000402| 0.0000437| 0.0000436| 0.0000452|
| 150x 3 | 0.0000403| 0.0000442| 0.0000432| 0.0000461|
| 200x 3 | 0.0000432| 0.0000480| 0.0000466| 0.0000476|
| 300x 3 | 0.0000476| 0.0000553| 0.0000533| 0.0000506|
| 400x 3 | 0.0000532| 0.0000617| 0.0000604| 0.0000533|
| 500x 3 | 0.0000575| 0.0000694| 0.0000670| 0.0000570|
| 1000x 3 | 0.0000826| 0.0001037| 0.0001029| 0.0000713|
| 2000x 3 | 0.0001340| 0.0001730| 0.0001706| 0.0000988|
| 3000x 3 | 0.0001818| 0.0002431| 0.0002402| 0.0001275|
| 4000x 3 | 0.0002298| 0.0003116| 0.0003092| 0.0001554|
| 5000x 3 | 0.0002777| 0.0003804| 0.0003814| 0.0001832|
|16384x 3 | 0.0008319| 0.0011868| 0.0012013| 0.0005041|
| 1x 10 | 0.0000339| 0.0000387| 0.0000348| 0.0000412|
| 5x 10 | 0.0000336| 0.0000346| 0.0000345| 0.0000423|
| 10x 10 | 0.0000349| 0.0000347| 0.0000343| 0.0000426|
| 20x 10 | 0.0000339| 0.0000369| 0.0000374| 0.0000423|
| 30x 10 | 0.0000371| 0.0000391| 0.0000378| 0.0000439|
| 40x 10 | 0.0000378| 0.0000397| 0.0000395| 0.0000433|
| 50x 10 | 0.0000393| 0.0000407| 0.0000408| 0.0000440|
| 60x 10 | 0.0000398| 0.0000411| 0.0000423| 0.0000444|
| 70x 10 | 0.0000403| 0.0000399| 0.0000416| 0.0000445|
| 80x 10 | 0.0000411| 0.0000414| 0.0000398| 0.0000449|
| 90x 10 | 0.0000410| 0.0000417| 0.0000406| 0.0000452|
| 100x 10 | 0.0000391| 0.0000424| 0.0000416| 0.0000457|
| 110x 10 | 0.0000403| 0.0000433| 0.0000425| 0.0000460|
| 120x 10 | 0.0000404| 0.0000446| 0.0000433| 0.0000465|
| 130x 10 | 0.0000414| 0.0000450| 0.0000440| 0.0000463|
| 140x 10 | 0.0000413| 0.0000460| 0.0000455| 0.0000467|
| 150x 10 | 0.0000420| 0.0000474| 0.0000455| 0.0000471|
| 200x 10 | 0.0000457| 0.0000511| 0.0000497| 0.0000496|
| 300x 10 | 0.0000517| 0.0000598| 0.0000579| 0.0000522|
| 400x 10 | 0.0000584| 0.0000697| 0.0000660| 0.0000560|
| 500x 10 | 0.0000646| 0.0000772| 0.0000759| 0.0000592|
| 1000x 10 | 0.0000956| 0.0001229| 0.0001169| 0.0000773|
| 2000x 10 | 0.0001586| 0.0002063| 0.0002010| 0.0001119|
| 3000x 10 | 0.0002192| 0.0002946| 0.0002856| 0.0001441|
| 4000x 10 | 0.0002827| 0.0003824| 0.0003686| 0.0001793|
| 5000x 10 | 0.0003435| 0.0004679| 0.0004548| 0.0002134|
|16384x 10 | 0.0010506| 0.0015780| 0.0015417| 0.0006026|
| 1x 100 | 0.0000348| 0.0000379| 0.0000405| 0.0000427|
| 5x 100 | 0.0000348| 0.0000346| 0.0000343| 0.0000433|
| 10x 100 | 0.0000337| 0.0000350| 0.0000367| 0.0000436|
| 20x 100 | 0.0000401| 0.0000388| 0.0000395| 0.0000445|
| 30x 100 | 0.0000411| 0.0000406| 0.0000393| 0.0000452|
| 40x 100 | 0.0000379| 0.0000381| 0.0000375| 0.0000462|
| 50x 100 | 0.0000391| 0.0000399| 0.0000384| 0.0000466|
| 60x 100 | 0.0000394| 0.0000412| 0.0000393| 0.0000474|
| 70x 100 | 0.0000422| 0.0000428| 0.0000408| 0.0000489|
| 80x 100 | 0.0000433| 0.0000439| 0.0000408| 0.0000492|
| 90x 100 | 0.0000436| 0.0000445| 0.0000425| 0.0000500|
| 100x 100 | 0.0000448| 0.0000458| 0.0000435| 0.0000510|
| 110x 100 | 0.0000467| 0.0000476| 0.0000448| 0.0000514|
| 120x 100 | 0.0000466| 0.0000481| 0.0000456| 0.0000527|
| 130x 100 | 0.0000487| 0.0000501| 0.0000469| 0.0000529|
| 140x 100 | 0.0000501| 0.0000515| 0.0000479| 0.0000538|
| 150x 100 | 0.0000503| 0.0000517| 0.0000494| 0.0000550|
| 200x 100 | 0.0000556| 0.0000595| 0.0000539| 0.0000592|
| 300x 100 | 0.0000670| 0.0000708| 0.0000639| 0.0000678|
| 400x 100 | 0.0000782| 0.0000825| 0.0000748| 0.0000727|
| 500x 100 | 0.0000898| 0.0000946| 0.0000857| 0.0000824|
| 1000x 100 | 0.0001492| 0.0001591| 0.0001409| 0.0001263|
| 2000x 100 | 0.0002653| 0.0002819| 0.0002536| 0.0002101|
| 3000x 100 | 0.0003822| 0.0004043| 0.0003598| 0.0002898|
| 4000x 100 | 0.0004926| 0.0005266| 0.0004686| 0.0003655|
| 5000x 100 | 0.0006051| 0.0006524| 0.0005765| 0.0004505|
|16384x 100 | 0.0020228| 0.0021531| 0.0019262| 0.0014567|
| 1x 256 | 0.0000374| 0.0000397| 0.0000358| 0.0000434|
| 5x 256 | 0.0000336| 0.0000409| 0.0000335| 0.0000434|
| 10x 256 | 0.0000399| 0.0000370| 0.0000404| 0.0000436|
| 20x 256 | 0.0000423| 0.0000408| 0.0000400| 0.0000450|
| 30x 256 | 0.0000383| 0.0000371| 0.0000373| 0.0000463|
| 40x 256 | 0.0000411| 0.0000394| 0.0000384| 0.0000469|
| 50x 256 | 0.0000418| 0.0000411| 0.0000386| 0.0000476|
| 60x 256 | 0.0000431| 0.0000424| 0.0000407| 0.0000481|
| 70x 256 | 0.0000455| 0.0000441| 0.0000419| 0.0000495|
| 80x 256 | 0.0000465| 0.0000456| 0.0000433| 0.0000496|
| 90x 256 | 0.0000493| 0.0000476| 0.0000445| 0.0000510|
| 100x 256 | 0.0000502| 0.0000495| 0.0000457| 0.0000522|
| 110x 256 | 0.0000524| 0.0000500| 0.0000467| 0.0000534|
| 120x 256 | 0.0000535| 0.0000517| 0.0000475| 0.0000535|
| 130x 256 | 0.0000554| 0.0000534| 0.0000493| 0.0000549|
| 140x 256 | 0.0000573| 0.0000551| 0.0000512| 0.0000553|
| 150x 256 | 0.0000597| 0.0000570| 0.0000521| 0.0000568|
| 200x 256 | 0.0000679| 0.0000639| 0.0000581| 0.0000631|
| 300x 256 | 0.0000850| 0.0000826| 0.0000713| 0.0000709|
| 400x 256 | 0.0001040| 0.0000962| 0.0000854| 0.0000832|
| 500x 256 | 0.0001231| 0.0001130| 0.0001000| 0.0000967|
| 1000x 256 | 0.0002105| 0.0001881| 0.0001694| 0.0001590|
| 2000x 256 | 0.0003913| 0.0003506| 0.0003000| 0.0002847|
| 3000x 256 | 0.0005685| 0.0005093| 0.0004264| 0.0004101|
| 4000x 256 | 0.0007300| 0.0006509| 0.0005445| 0.0005121|
| 5000x 256 | 0.0009223| 0.0008244| 0.0006968| 0.0006512|
|16384x 256 | 0.0036732| 0.0032559| 0.0029458| 0.0025328|
| 1x 512 | 0.0000366| 0.0000392| 0.0000344| 0.0000438|
| 5x 512 | 0.0000394| 0.0000354| 0.0000346| 0.0000441|
| 10x 512 | 0.0000408| 0.0000401| 0.0000381| 0.0000448|
| 20x 512 | 0.0000444| 0.0000435| 0.0000389| 0.0000453|
| 30x 512 | 0.0000410| 0.0000403| 0.0000386| 0.0000474|
| 40x 512 | 0.0000446| 0.0000429| 0.0000403| 0.0000481|
| 50x 512 | 0.0000476| 0.0000463| 0.0000428| 0.0000496|
| 60x 512 | 0.0000507| 0.0000478| 0.0000440| 0.0000510|
| 70x 512 | 0.0000539| 0.0000502| 0.0000458| 0.0000518|
| 80x 512 | 0.0000577| 0.0000538| 0.0000476| 0.0000537|
| 90x 512 | 0.0000602| 0.0000558| 0.0000504| 0.0000572|
| 100x 512 | 0.0000616| 0.0000582| 0.0000518| 0.0000565|
| 110x 512 | 0.0000667| 0.0000607| 0.0000556| 0.0000600|
| 120x 512 | 0.0000689| 0.0000642| 0.0000576| 0.0000612|
| 130x 512 | 0.0000735| 0.0000654| 0.0000587| 0.0000616|
| 140x 512 | 0.0000744| 0.0000695| 0.0000607| 0.0000661|
| 150x 512 | 0.0000759| 0.0000695| 0.0000636| 0.0000653|
| 200x 512 | 0.0000913| 0.0000831| 0.0000738| 0.0000793|
| 300x 512 | 0.0001299| 0.0001136| 0.0001037| 0.0001073|
| 400x 512 | 0.0001585| 0.0001409| 0.0001313| 0.0001338|
| 500x 512 | 0.0001883| 0.0001598| 0.0001498| 0.0001580|
| 1000x 512 | 0.0003369| 0.0002909| 0.0002665| 0.0002632|
| 2000x 512 | 0.0006396| 0.0005457| 0.0004962| 0.0004790|
| 3000x 512 | 0.0009528| 0.0008094| 0.0007397| 0.0006964|
| 4000x 512 | 0.0013894| 0.0011007| 0.0010205| 0.0009181|
| 5000x 512 | 0.0018264| 0.0015710| 0.0015030| 0.0012737|
|16384x 512 | 0.0074474| 0.0067132| 0.0065738| 0.0060986|
| 1x 1024 | 0.0000347| 0.0000357| 0.0000356| 0.0000445|
| 5x 1024 | 0.0000391| 0.0000385| 0.0000362| 0.0000450|
| 10x 1024 | 0.0000383| 0.0000410| 0.0000393| 0.0000459|
| 20x 1024 | 0.0000439| 0.0000408| 0.0000384| 0.0000493|
| 30x 1024 | 0.0000489| 0.0000458| 0.0000417| 0.0000504|
| 40x 1024 | 0.0000557| 0.0000516| 0.0000448| 0.0000527|
| 50x 1024 | 0.0000587| 0.0000545| 0.0000490| 0.0000546|
| 60x 1024 | 0.0000650| 0.0000586| 0.0000516| 0.0000610|
| 70x 1024 | 0.0000707| 0.0000617| 0.0000563| 0.0000616|
| 80x 1024 | 0.0000768| 0.0000683| 0.0000618| 0.0000659|
| 90x 1024 | 0.0000834| 0.0000735| 0.0000695| 0.0000716|
| 100x 1024 | 0.0000886| 0.0000759| 0.0000708| 0.0000757|
| 110x 1024 | 0.0000949| 0.0000833| 0.0000779| 0.0000826|
| 120x 1024 | 0.0001031| 0.0000882| 0.0000841| 0.0000858|
| 130x 1024 | 0.0001088| 0.0000956| 0.0000887| 0.0000903|
| 140x 1024 | 0.0001156| 0.0001010| 0.0000923| 0.0000969|
| 150x 1024 | 0.0001152| 0.0001057| 0.0000978| 0.0001028|
| 200x 1024 | 0.0001450| 0.0001328| 0.0001257| 0.0001313|
| 300x 1024 | 0.0002082| 0.0001793| 0.0001721| 0.0001762|
| 400x 1024 | 0.0002650| 0.0002286| 0.0002192| 0.0002239|
| 500x 1024 | 0.0003157| 0.0002699| 0.0002596| 0.0002592|
| 1000x 1024 | 0.0005968| 0.0005075| 0.0004867| 0.0004764|
| 2000x 1024 | 0.0012529| 0.0010099| 0.0009831| 0.0009186|
| 3000x 1024 | 0.0021209| 0.0017632| 0.0018832| 0.0017245|
| 4000x 1024 | 0.0029924| 0.0025091| 0.0027886| 0.0024449|
| 5000x 1024 | 0.0040340| 0.0034140| 0.0037645| 0.0033816|
|16384x 1024 | 0.0149200| 0.0130993| 0.0142042| 0.0132403|
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]