https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #18 from Andrew Roberts <andrewm.roberts at sky dot com> ---
Ok trying an entirely different algorith, same results:

Using Mersenne Twister algorithm from here:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html

alter main program to comment out original test harness, and replace
main with:

int main(void)
{
    int i;
    unsigned long init[4]={0x123, 0x234, 0x345, 0x456}, length=4;
    init_by_array(init, length);
    clock_t e, s=clock();
    int j=genrand_int32();
    for(i=0; i<100000000; i++)
    {
      j ^= genrand_int32();
    }
    e=clock();
    if (j != -549769613) printf("Error j != -549769613 (%d)\n", j);
    printf("mt19937ar took %ld clocks ", (long)(e-s));
    return 0;
}

So nothing complicated.
On Ryzen:
--------

Top 5:
mt19937ar took 354877 clocks -march=amdfam10 -mtune=k8
mt19937ar took 356203 clocks -march=bdver2 -mtune=eden-x2
mt19937ar took 356534 clocks -march=nano-x2 -mtune=nano-1000
mt19937ar took 357321 clocks -march=athlon-fx -mtune=nano-x4
mt19937ar took 357634 clocks -march=bdver3 -mtune=nano-x2

Bot 5:
mt19937ar took 675052 clocks -march=nano -mtune=btver1
mt19937ar took 679826 clocks -march=k8 -mtune=nocona
mt19937ar took 681118 clocks -march=opteron -mtune=atom
mt19937ar took 689604 clocks -march=core2 -mtune=broadwell
mt19937ar took 699840 clocks -march=skylake -mtune=generic

Top -mtune=znver1
mt19937ar took 369722 clocks -march=nano-x2 -mtune=znver1

Top -march=znver1
mt19937ar took 375286 clocks -march=znver1 -mtune=silvermont

-march=znver1 -mtune=znver1 (aka native)
mt19937ar took 430875 clocks -march=znver1 -mtune=znver1

-march=haswell -mtune=haswell
mt19937ar took 402963 clocks -march=haswell -mtune=haswell

-march=k8 -mtune=k8
mt19937ar took 367890 clocks -march=k8 -mtune=k8

so -march=znver1 -mtune=znver1 is:
7% slower than tuning for haswell
17% slower than tuning for k8

Again -mtune=znver1, -mtune=bdverX, -mtune=btverX all cluster at the bottom

On Haswell:
----------

Top 5:
mt19937ar took 290000 clocks -march=amdfam10 -mtune=barcelona
mt19937ar took 290000 clocks -march=amdfam10 -mtune=bdver1
mt19937ar took 290000 clocks -march=amdfam10 -mtune=bdver2
mt19937ar took 290000 clocks -march=amdfam10 -mtune=bdver3
mt19937ar took 290000 clocks -march=amdfam10 -mtune=bdver4

Bot 5:
mt19937ar took 370000 clocks -march=znver1 -mtune=bdver3
mt19937ar took 370000 clocks -march=znver1 -mtune=bdver4
mt19937ar took 370000 clocks -march=znver1 -mtune=btver2
mt19937ar took 370000 clocks -march=znver1 -mtune=znver1
mt19937ar took 380000 clocks -march=knl -mtune=bdver1

Top -mtune=haswell
mt19937ar took 300000 clocks -march=bdver4 -mtune=haswell

Top -march=haswell
mt19937ar took 300000 clocks -march=haswell -mtune=broadwell

-march=haswell -mtune=haswell (aka native)
mt19937ar took 300000 clocks -march=haswell -mtune=haswell

Best performing pair:
mt19937ar took 290000 clocks -march=barcelona -mtune=barcelona

so the haswell options are pretty much optimal on that hardware
 as from other test.

Reply via email to