The short answer is the Lapack benchmark. For years computers, especially 
supercomputers, have been benchmarked on how fast they perform this particular 
calculation. As a result, the code actually used in various accelerated BLAS 
implementations is highly tuned.

Reply via email to