Hi, guys. The patch for harmony-5901 is ready.

I compared the performance for all the 5 benchmarks in "Java vs. C
benchmark" by Stefan Krause on my desktop
workstation (Intel Core 2 Quad [EMAIL PROTECTED], 3.23G RAM, Windows XP SP2).
With MUL/DIV replaced by shift,
spectralnorm is improved more than 29% (from 450258 msec to 319578 msec) and
no noticeable change in other
4 benchmarks.

In the webpage http://www.stefankrause.net/wp/?p=9#more-9 you may see
spectralnorm is the one which drags
the total score of Harmony most. Attachments are the detailed experiment
inputs and results.

With further thoughts, I found we'd better do more for this "numerical
strength reduction" optimization. In
Integer.parseInt() method, there is a hot *10 operation if the code is
inlined. Actually gcc does this optimization for
multiplications if one of multipliers is constant and not too large. I'm
going to take care of the generalized work these
days.

Thanks.

Xiaoming
BENCHMARK               OPTIONS                         INPUT
        
mandelbrot_long         -Xem:opt                        100000
fannkuch_long           -Xem:opt                        13
himeno_bench2           -Xem:opt -Xmx1024m -Xms1024m    M
nbody_long              -Xem:opt                        1000000000
spectralnorm_long       -Xem:opt                        30000
==========Testing mandelbrot_long 100000========== 
Run # 0
duration =|1592197| msec. count=-325819012
==========Testing mandelbrot_long 100000========== 
Run # 0
duration =|1592116| msec. count=-325819012
==========Testing mandelbrot_long 100000========== 
Run # 0
duration =|1592279| msec. count=-325819012
==========Testing fannkuch_long 13========== 
Pfannkuchen(13) = 80
Duration |1100759|
==========Testing fannkuch_long 13========== 
Pfannkuchen(13) = 80
Duration |1100753|
==========Testing fannkuch_long 13========== 
Pfannkuchen(13) = 80
Duration |1100735|
==========Testing himeno_bench2 M========== 
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
|26626|
|26611|
|26626|
|26627|
|26642|
|26610|
|26642|
|26611|
|26627|
|26610|
 Loop executed for 30 times
 Gosa : 0.0011446069938308213
 MFLOPS measured : 0.0  cpu : 266.23200011253357
cpu=|266.23200011253357| gosa=0.0011446069938308213
 Score based on Pentium III 600MHz using Fortran 77: 0.0
==========Testing himeno_bench2 M========== 
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
|26752|
|26674|
|26673|
|26673|
|26689|
|26673|
|26673|
|26689|
|26674|
|26657|
 Loop executed for 30 times
 Gosa : 0.0011446069938308213
 MFLOPS measured : 0.0  cpu : 266.8420000076294
cpu=|266.8420000076294| gosa=0.0011446069938308213
 Score based on Pentium III 600MHz using Fortran 77: 0.0
==========Testing himeno_bench2 M========== 
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
|26704|
|26720|
|26705|
|26689|
|26704|
|26705|
|26689|
|26688|
|26689|
|26705|
 Loop executed for 30 times
 Gosa : 0.0011446069938308213
 MFLOPS measured : 0.0  cpu : 266.9980001449585
cpu=|266.9980001449585| gosa=0.0011446069938308213
 Score based on Pentium III 600MHz using Fortran 77: 0.0
==========Testing nbody_long 1000000000========== 
-0.169075164
-0.169051539
Duration |723642|
==========Testing nbody_long 1000000000========== 
-0.169075164
-0.169051539
Duration |723548|
==========Testing nbody_long 1000000000========== 
-0.169075164
-0.169051539
Duration |723720|
==========Testing spectralnorm_long 30000========== 
1.274224153
Duration |450258|
==========Testing spectralnorm_long 30000========== 
1.274224153
Duration |450258|
==========Testing spectralnorm_long 30000========== 
1.274224153
Duration |450351|
==========Testing mandelbrot_long 100000========== 
Run # 0
duration =|1592552| msec. count=-325819012
==========Testing mandelbrot_long 100000========== 
Run # 0
duration =|1593111| msec. count=-325819012
==========Testing mandelbrot_long 100000========== 
Run # 0
duration =|1592993| msec. count=-325819012
==========Testing fannkuch_long 13========== 
Pfannkuchen(13) = 80
Duration |1100976|
==========Testing fannkuch_long 13========== 
Pfannkuchen(13) = 80
Duration |1100846|
==========Testing fannkuch_long 13========== 
Pfannkuchen(13) = 80
Duration |1100963|
==========Testing himeno_bench2 M========== 
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
|26297|
|26298|
|26297|
|26220|
|26532|
|26953|
|26392|
|26219|
|26204|
|26251|
 Loop executed for 30 times
 Gosa : 0.0011446069938308213
 MFLOPS measured : 0.0  cpu : 263.6630001068115
cpu=|263.6630001068115| gosa=0.0011446069938308213
 Score based on Pentium III 600MHz using Fortran 77: 0.0
==========Testing himeno_bench2 M========== 
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
|27001|
|26860|
|26875|
|26876|
|26876|
|26876|
|26875|
|26907|
|26923|
|26922|
 Loop executed for 30 times
 Gosa : 0.0011446069938308213
 MFLOPS measured : 0.0  cpu : 269.0069999694824
cpu=|269.0069999694824| gosa=0.0011446069938308213
 Score based on Pentium III 600MHz using Fortran 77: 0.0
==========Testing himeno_bench2 M========== 
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
|26360|
|26329|
|26329|
|26329|
|26375|
|26329|
|26329|
|26329|
|26360|
|26313|
 Loop executed for 30 times
 Gosa : 0.0011446069938308213
 MFLOPS measured : 0.0  cpu : 263.382000207901
cpu=|263.382000207901| gosa=0.0011446069938308213
 Score based on Pentium III 600MHz using Fortran 77: 0.0
==========Testing nbody_long 1000000000========== 
-0.169075164
-0.169051539
Duration |719762|
==========Testing nbody_long 1000000000========== 
-0.169075164
-0.169051539
Duration |719793|
==========Testing nbody_long 1000000000========== 
-0.169075164
-0.169051539
Duration |719871|
==========Testing spectralnorm_long 30000========== 
1.274224153
Duration |319578|
==========Testing spectralnorm_long 30000========== 
1.274224153
Duration |319625|
==========Testing spectralnorm_long 30000========== 
1.274224153
Duration |319485|

Reply via email to