On 06/10/2018 08:01 PM, Walter Bright wrote:
On 6/10/2018 4:39 PM, David Nadlinger wrote:
That's not entirely true. Intel started optimising some of the REP
string instructions again on Ivy Bridge and above. There is a CPUID
bit to indicate that (ERMS?); I'm sure the Optimization Manual has
further details. From what I remember, `rep movsb` is supposed to beat
an AVX loop on most recent Intel µarchs if the destination is aligned
and the data is longer than a few cache
The drama of which instruction mix is faster on which CPU never abates!
In many ways, I really miss 80's machine architecture ;) So simple.