Hi,

first of all, thanks for all the great MPIR work! I've been using it for about 
4 years to compute visually compelling deep Mandelbrot zoom videos.

Yesterday I've downloaded 3.0.0 and compiled it using VS 2015 U3 on an Intel 
Core i7 6900K (8 cores, Broadwell) on Windows 10 64.

Unfortunately, 3.0.0 seems to be slower than 2.7.2 by about 8% when small 
floats are used. By small floats I mean a precision of up to 256 bits (4 limbs 
on x64).

Compilation worked flawlessly for all of the 10 architectures I've selected. 
Just to make sure Visual Studio updates are not the source of the problem, I 
also recompiled the 7 architectures I've been testing with 2.7.2.

The stats below are based on several hundred million Mandelbrot iterations for 
each data point. All 16 threads of the 6900K are used and all of them are at 
100% capacity. 

I get the following speedup matrix for 128 precision floats over all compiled 
versions and architectures:

Results from file: Run_2017-03-04T20_10_18.xml; number model: GMP128  
   1 mpir_3_0_0_x64_gc                          MFlops:    252.6 
   2 mpir_2_7_2_x64_gc                          MFlops:    274.7 Speedup:     
8.73% 
   3 mpir_3_0_0_x64_haswell_avx                 MFlops:    357.9 Speedup:    
41.67%    30.30% 
   4 mpir_3_0_0_x64_skylake_avx                 MFlops:    365.9 Speedup:    
44.82%    33.20%     2.22% 
   5 mpir_3_0_0_x64_haswell                     MFlops:    368.1 Speedup:    
45.72%    34.02%     2.86%     0.62% 
   6 mpir_3_0_0_x64_skylake                     MFlops:    371.0 Speedup:    
46.84%    35.05%     3.65%     1.39%     0.77% 
   7 mpir_3_0_0_x64_core2                       MFlops:    377.0 Speedup:    
49.23%    37.26%     5.34%     3.05%     2.41%     1.63% 
   8 mpir_3_0_0_x64_sandybridge_ivybridge       MFlops:    386.7 Speedup:    
53.07%    40.79%     8.05%     5.70%     5.05%     4.25%     2.57% 
   9 mpir_3_0_0_x64_nehalem_westmere            MFlops:    389.3 Speedup:    
54.10%    41.74%     8.78%     6.41%     5.76%     4.95%     3.26%     0.67% 
  10 mpir_3_0_0_x64_nehalem                     MFlops:    389.5 Speedup:    
54.19%    41.82%     8.84%     6.47%     5.82%     5.01%     3.32%     0.73%    
 0.06% 
  11 mpir_3_0_0_x64_sandybridge                 MFlops:    395.1 Speedup:    
56.39%    43.84%    10.39%     7.99%     7.33%     6.51%     4.80%     2.17%    
 1.48%     1.43% 
  12 mpir_2_7_2_x64_haswell                     MFlops:    398.3 Speedup:    
57.66%    45.01%    11.28%     8.87%     8.20%     7.37%     5.65%     3.00%    
 2.31%     2.25%     0.81% 
  13 mpir_2_7_2_x64_sandybridge_ivybridge       MFlops:    404.3 Speedup:    
60.04%    47.20%    12.97%    10.51%     9.83%     8.99%     7.24%     4.55%    
 3.85%     3.79%     2.33%     1.51% 
  14 mpir_2_7_2_x64_sandybridge                 MFlops:    405.2 Speedup:    
60.40%    47.52%    13.22%    10.76%    10.07%     9.23%     7.48%     4.78%    
 4.08%     4.02%     2.56%     1.74%     0.22% 
  15 mpir_2_7_2_x64_nehalem_westmere            MFlops:    417.3 Speedup:    
65.16%    51.91%    16.58%    14.05%    13.35%    12.48%    10.67%     7.90%    
 7.18%     7.12%     5.61%     4.76%     3.20%     2.97% 
  16 mpir_2_7_2_x64_core2                       MFlops:    419.0 Speedup:    
65.85%    52.54%    17.07%    14.53%    13.82%    12.95%    11.14%     8.35%    
 7.62%     7.56%     6.05%     5.20%     3.63%     3.40%     0.42% 
  17 mpir_2_7_2_x64_nehalem                     MFlops:    422.8 Speedup:    
67.37%    53.94%    18.14%    15.58%    14.86%    13.99%    12.16%     9.34%    
 8.61%     8.55%     7.02%     6.16%     4.58%     4.35%     1.34%     0.92% 
                                                                                
 1         2         3         4         5         6         7         8        
 9        10        11        12        13        14        15        16

I've taken these measurements three times with the same results.

The six fastest versions are all 2.7.2.

Note that architectural compilation and the Broadwell CPU do not seem to be the 
issue, since the slowest two versions, the generic C mpir_3_0_0_x64_gc and 
mpir_2_7_2_x64_gc also differ by about 8%. Both compiled on the same machine 
within 5 minutes of each other with VS 2015.

Another hint that architectural compilation and optimization is working fine, 
is that once I test with 1024 bits precision, the fastest version is 
mpir_3_0_0_x64_skylake_avx (the Broadwell CPU used in this test already has 
most of the improvements of Skylake). Unfortunately, I very rarely zoom down to 
a magnification that needs 1024 bits.

I have not done any tuning yet, but my understanding is that for limb sizes 1, 
2 or 3 it should not matter anyway.

Any hints or ideas on what I may be doing wrong?

Does this also happen on other OSes/CPUs?

Thanks and best regards,

Marcus

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to