[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805034#action_12805034
]
Stanislaw Osinski commented on LUCENE-2221:
-------------------------------------------
I ran the benchmark on a 64bit Linux running an Intel(R) Xeon(R) E5520 @
2.27GHz. I tried both Sun's JDK 1.7-ea as well as JDK 1.6.0_18, which also has
support for native {{POPCNT}}.
*JDK 1.7-ea {{-server -XX:+UsePopCountInstruction}}*
{noformat}
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_trunk.test_pop_array : 15/20 rounds, time.total:
7.69, time.warmup: 1.96, time.bench: 5.73, round: 0.38 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_xor : 15/20 rounds, time.total:
11.13, time.warmup: 2.81, time.bench: 8.32, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_intersect : 15/20 rounds, time.total:
11.13, time.warmup: 2.82, time.bench: 8.32, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_andnot : 15/20 rounds, time.total:
10.46, time.warmup: 2.66, time.bench: 7.80, round: 0.52 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_union : 15/20 rounds, time.total:
11.13, time.warmup: 2.81, time.bench: 8.32, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_int : 5/7 rounds, time.total:
42.30, time.warmup: 12.02, time.bench: 30.29, round: 6.06 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_long : 5/7 rounds, time.total:
55.48, time.warmup: 15.43, time.bench: 40.05, round: 8.01 [+- 0.06], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_pop3264.test_pop_array : 15/20 rounds, time.total:
7.78, time.warmup: 2.05, time.bench: 5.73, round: 0.38 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_xor : 15/20 rounds, time.total:
11.13, time.warmup: 2.82, time.bench: 8.32, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_intersect : 15/20 rounds, time.total:
11.14, time.warmup: 2.82, time.bench: 8.32, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_andnot : 15/20 rounds, time.total:
10.46, time.warmup: 2.66, time.bench: 7.80, round: 0.52 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_union : 15/20 rounds, time.total:
11.13, time.warmup: 2.81, time.bench: 8.32, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE.test_pop_array : 15/20 rounds, time.total:
5.06, time.warmup: 1.29, time.bench: 3.77, round: 0.25 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_xor : 15/20 rounds, time.total:
8.54, time.warmup: 2.15, time.bench: 6.39, round: 0.43 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_intersect : 15/20 rounds, time.total:
8.54, time.warmup: 2.15, time.bench: 6.39, round: 0.43 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_andnot : 15/20 rounds, time.total:
7.81, time.warmup: 1.99, time.bench: 5.81, round: 0.39 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_union : 15/20 rounds, time.total:
8.54, time.warmup: 2.15, time.bench: 6.39, round: 0.43 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_int : 5/7 rounds, time.total:
33.55, time.warmup: 8.72, time.bench: 24.83, round: 4.97 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_long: 5/7 rounds, time.total:
39.61, time.warmup: 11.48, time.bench: 28.12, round: 5.62 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE_simple.test_pop_array : 15/20 rounds, time.total:
3.25, time.warmup: 0.82, time.bench: 2.43, round: 0.16 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_xor : 15/20 rounds, time.total:
5.05, time.warmup: 1.27, time.bench: 3.78, round: 0.25 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_intersect: 15/20 rounds,
time.total: 5.05, time.warmup: 1.27, time.bench: 3.78, round: 0.25 [+- 0.00],
round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_andnot: 15/20 rounds, time.total:
5.34, time.warmup: 1.34, time.bench: 4.00, round: 0.27 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_union : 15/20 rounds, time.total:
5.05, time.warmup: 1.27, time.bench: 3.78, round: 0.25 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
VM option '+UsePopCountInstruction'
{noformat}
*JDK 1.7-ea {{-server -XX:-UsePopCountInstruction}}*
{noformat}
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_trunk.test_pop_array : 15/20 rounds, time.total:
8.43, time.warmup: 2.29, time.bench: 6.13, round: 0.41 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_xor : 15/20 rounds, time.total:
11.97, time.warmup: 2.94, time.bench: 9.03, round: 0.60 [+- 0.03], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_intersect : 15/20 rounds, time.total:
12.88, time.warmup: 3.25, time.bench: 9.63, round: 0.64 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_andnot : 15/20 rounds, time.total:
12.19, time.warmup: 3.09, time.bench: 9.11, round: 0.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_union : 15/20 rounds, time.total:
12.87, time.warmup: 3.25, time.bench: 9.63, round: 0.64 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_int : 5/7 rounds, time.total:
46.44, time.warmup: 13.03, time.bench: 33.41, round: 6.68 [+- 0.01], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_long : 5/7 rounds, time.total:
56.51, time.warmup: 16.31, time.bench: 40.20, round: 8.04 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_pop3264.test_pop_array : 15/20 rounds, time.total:
9.09, time.warmup: 2.38, time.bench: 6.71, round: 0.45 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_xor : 15/20 rounds, time.total:
12.75, time.warmup: 3.22, time.bench: 9.54, round: 0.64 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_intersect : 15/20 rounds, time.total:
12.75, time.warmup: 3.21, time.bench: 9.54, round: 0.64 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_andnot : 15/20 rounds, time.total:
12.08, time.warmup: 3.06, time.bench: 9.02, round: 0.60 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_union : 15/20 rounds, time.total:
12.76, time.warmup: 3.22, time.bench: 9.54, round: 0.64 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE.test_pop_array : 15/20 rounds, time.total:
8.98, time.warmup: 2.28, time.bench: 6.70, round: 0.45 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_xor : 15/20 rounds, time.total:
12.75, time.warmup: 3.21, time.bench: 9.54, round: 0.64 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_intersect : 15/20 rounds, time.total:
12.76, time.warmup: 3.22, time.bench: 9.54, round: 0.64 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_andnot : 15/20 rounds, time.total:
12.09, time.warmup: 3.06, time.bench: 9.03, round: 0.60 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_union : 15/20 rounds, time.total:
11.94, time.warmup: 3.22, time.bench: 8.72, round: 0.58 [+- 0.04], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_int : 5/7 rounds, time.total:
32.77, time.warmup: 8.92, time.bench: 23.84, round: 4.77 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_long: 5/7 rounds, time.total:
39.61, time.warmup: 11.49, time.bench: 28.12, round: 5.62 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.7.0-ea-fastdebug, Java HotSpot(TM) 64-Bit Server VM, 17.0-b07-fastdebug,
Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE_simple.test_pop_array : 15/20 rounds, time.total:
18.03, time.warmup: 4.52, time.bench: 13.51, round: 0.90 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_xor : 15/20 rounds, time.total:
19.65, time.warmup: 4.93, time.bench: 14.73, round: 0.98 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_intersect: 15/20 rounds,
time.total: 19.65, time.warmup: 4.93, time.bench: 14.73, round: 0.98 [+- 0.00],
round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_andnot: 15/20 rounds, time.total:
20.49, time.warmup: 5.14, time.bench: 15.35, round: 1.02 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_union : 15/20 rounds, time.total:
19.65, time.warmup: 4.93, time.bench: 14.73, round: 0.98 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
VM option '-UsePopCountInstruction'
{noformat}
*JDK 1.6.0_18 {{-server -XX:+UsePopCountInstruction}}*
{noformat}
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_trunk.test_pop_array : 15/20 rounds, time.total:
8.42, time.warmup: 2.13, time.bench: 6.30, round: 0.42 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_xor : 15/20 rounds, time.total:
12.12, time.warmup: 3.04, time.bench: 9.08, round: 0.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_intersect : 15/20 rounds, time.total:
12.12, time.warmup: 3.04, time.bench: 9.08, round: 0.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_andnot : 15/20 rounds, time.total:
11.53, time.warmup: 2.90, time.bench: 8.63, round: 0.58 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_union : 15/20 rounds, time.total:
12.12, time.warmup: 3.04, time.bench: 9.08, round: 0.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_int : 5/7 rounds, time.total:
46.85, time.warmup: 13.32, time.bench: 33.53, round: 6.71 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_long : 5/7 rounds, time.total:
57.93, time.warmup: 16.79, time.bench: 41.15, round: 8.23 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_pop3264.test_pop_array : 15/20 rounds, time.total:
8.43, time.warmup: 2.14, time.bench: 6.29, round: 0.42 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_xor : 15/20 rounds, time.total:
12.12, time.warmup: 3.04, time.bench: 9.08, round: 0.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_intersect : 15/20 rounds, time.total:
12.12, time.warmup: 3.04, time.bench: 9.08, round: 0.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_andnot : 15/20 rounds, time.total:
11.53, time.warmup: 2.90, time.bench: 8.63, round: 0.58 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_union : 15/20 rounds, time.total:
12.12, time.warmup: 3.04, time.bench: 9.08, round: 0.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE.test_pop_array : 15/20 rounds, time.total:
5.57, time.warmup: 1.42, time.bench: 4.15, round: 0.28 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_xor : 15/20 rounds, time.total:
9.43, time.warmup: 2.36, time.bench: 7.07, round: 0.47 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_intersect : 15/20 rounds, time.total:
9.43, time.warmup: 2.36, time.bench: 7.07, round: 0.47 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_andnot : 15/20 rounds, time.total:
8.64, time.warmup: 2.18, time.bench: 6.46, round: 0.43 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_union : 15/20 rounds, time.total:
9.42, time.warmup: 2.36, time.bench: 7.07, round: 0.47 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_int : 5/7 rounds, time.total:
34.36, time.warmup: 9.70, time.bench: 24.66, round: 4.93 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_long: 5/7 rounds, time.total:
43.89, time.warmup: 13.01, time.bench: 30.88, round: 6.18 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE_simple.test_pop_array : 15/20 rounds, time.total:
3.58, time.warmup: 0.90, time.bench: 2.69, round: 0.18 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_xor : 15/20 rounds, time.total:
5.56, time.warmup: 1.39, time.bench: 4.16, round: 0.28 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_intersect: 15/20 rounds,
time.total: 5.56, time.warmup: 1.39, time.bench: 4.16, round: 0.28 [+- 0.00],
round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_andnot: 15/20 rounds, time.total:
5.89, time.warmup: 1.48, time.bench: 4.42, round: 0.29 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_union : 15/20 rounds, time.total:
5.56, time.warmup: 1.39, time.bench: 4.16, round: 0.28 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
{noformat}
*JDK 1.6.0_18 {{-server -XX:-UsePopCountInstruction}}*
{noformat}
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_trunk.test_pop_array : 15/20 rounds, time.total:
7.67, time.warmup: 1.94, time.bench: 5.74, round: 0.38 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_xor : 15/20 rounds, time.total:
11.02, time.warmup: 2.77, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_intersect : 15/20 rounds, time.total:
11.02, time.warmup: 2.77, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_andnot : 15/20 rounds, time.total:
10.49, time.warmup: 2.64, time.bench: 7.85, round: 0.52 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_pop_union : 15/20 rounds, time.total:
11.02, time.warmup: 2.77, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_int : 5/7 rounds, time.total:
42.61, time.warmup: 12.11, time.bench: 30.49, round: 6.10 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_trunk.test_ntz_iterator_long : 5/7 rounds, time.total:
52.69, time.warmup: 15.27, time.bench: 37.42, round: 7.48 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_pop3264.test_pop_array : 15/20 rounds, time.total:
7.69, time.warmup: 1.95, time.bench: 5.74, round: 0.38 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_xor : 15/20 rounds, time.total:
11.02, time.warmup: 2.76, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_intersect : 15/20 rounds, time.total:
11.02, time.warmup: 2.77, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_andnot : 15/20 rounds, time.total:
10.48, time.warmup: 2.64, time.bench: 7.84, round: 0.52 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_pop3264.test_pop_union : 15/20 rounds, time.total:
11.02, time.warmup: 2.76, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE.test_pop_array : 15/20 rounds, time.total:
7.67, time.warmup: 1.94, time.bench: 5.74, round: 0.38 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_xor : 15/20 rounds, time.total:
11.02, time.warmup: 2.76, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_intersect : 15/20 rounds, time.total:
11.02, time.warmup: 2.77, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_andnot : 15/20 rounds, time.total:
10.48, time.warmup: 2.64, time.bench: 7.84, round: 0.52 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_pop_union : 15/20 rounds, time.total:
11.02, time.warmup: 2.77, time.bench: 8.25, round: 0.55 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_int : 5/7 rounds, time.total:
31.27, time.warmup: 8.83, time.bench: 22.44, round: 4.49 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE.test_ntz_iterator_long: 5/7 rounds, time.total:
39.90, time.warmup: 11.84, time.bench: 28.06, round: 5.61 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
# 1.6.0_18, Java HotSpot(TM) 64-Bit Server VM, 16.0-b13, Sun Microsystems Inc.,
Benchmark_BitUtil_popNtzJRE_simple.test_pop_array : 15/20 rounds, time.total:
18.03, time.warmup: 4.51, time.bench: 13.52, round: 0.90 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_xor : 15/20 rounds, time.total:
21.25, time.warmup: 5.01, time.bench: 16.24, round: 1.08 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_intersect: 15/20 rounds,
time.total: 21.67, time.warmup: 5.43, time.bench: 16.25, round: 1.08 [+- 0.00],
round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_andnot: 15/20 rounds, time.total:
22.43, time.warmup: 5.62, time.bench: 16.81, round: 1.12 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
Benchmark_BitUtil_popNtzJRE_simple.test_pop_union : 15/20 rounds, time.total:
21.65, time.warmup: 5.42, time.bench: 16.23, round: 1.08 [+- 0.00], round.gc:
0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
{noformat}
> Micro-benchmarks for ntz and pop (BitUtils) operations.
> -------------------------------------------------------
>
> Key: LUCENE-2221
> URL: https://issues.apache.org/jira/browse/LUCENE-2221
> Project: Lucene - Java
> Issue Type: Task
> Components: Other
> Reporter: Dawid Weiss
> Priority: Trivial
> Attachments: benchmark.jar, benchmarks.txt,
> lucene-bitset-benchmarks.zip, results-popntz.txt
>
>
> As suggested by Yonik, I performed a suite of micro-benchmarks to investigate
> the following:
> * pop() (bitCount) seems to be implemented in the same way ("hacker's
> delight") as in the BitUtils class (SUN's standard library from version 1.5).
> * native intrinsics have been recently added to the HotSpot that should speed
> up bitCount significantly.
> I have tried to run the code on various VMs and architectures, but of course
> the results may vary depending on the setting.
> h2. Micro-benchmark code, evaluation
> * The micro-benchmarks were written as JUnit tests with a custom set of rules
> that repeats each test measuring execution time, gc use, etc.
> * There were 5 warmup runs before each test, followed by 15 benchmarked runs.
> The result contain overall times, round times and standard deviations where
> applicable.
> * There were several tests for isolated performance of {{BitUtil.pop()}},
> JDK's {{bitCount}}, {{BitUtil.ntz()}}, {{BitUtil.ntz2()}}, {{BitUtil.ntz3()}}
> and JDK's {{numberOfTrailingZeros}}, the test code had the following loop:
> {code}
> final long [] longs = NtzPopBenchmark.random.bits;
> int cnt = 0;
> for (int i = longs.length; --i >= 0;)
> {
> cnt += Long.bitCount(longs[i]);
> }
> volatileVariable = cnt; // to prevent dead code removal.
> {code}
> * I also added another version of pop() based on a precomputed bit counts.
> This version was called {{pop2}}.
> * The input array of long values was initialized to a memory taking 200MB.
> There were two different sets: {{random}} (random values) and {{single}}
> (single bit rotated to the right in each long).
> * There were tests that called {{BitUtil.pop_xor}} between the two input
> bitsets (random, single).
> * Additional tests that used iterators and and other BitUtil operations
> showed similar performance to isolated loops, so I omit them here.
> h2. Evaluation environment
> I tested on three different machines:
> * Pentium 4, 32-bit, 3GHZ, 2GB RAM (Windows)
> * AMD Athlon(tm) 64 X2 Dual Core Processor 5200+, 64-bit, 4GB RAM (Ubuntu)
> * Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz, 64-bit, 4GB RAM (Ubuntu)
> and on various VMs:
> * 1.6.0_17, Java HotSpot(TM) Server VM, 14.3-b01, Sun Microsystems Inc.,
> * 1.5.0_18, Java HotSpot(TM) Server VM, 1.5.0_18-b02, Sun Microsystems Inc.,
> * 1.7.0-ea, Java HotSpot(TM) Server VM, 17.0-b06, Sun Microsystems Inc.,
> * 1.6.0, IBM J9 VM, 2.4, IBM Corporation,
> * BEA JRockit.
> * (ant other minor versions of the VMs above, depending on the computer).
> h2. Results overview
> h3. {{pop}}
> The times between {{BitUtil}} and JDK were mostly identical. However, on
> 32-bit systems, precached {{pop2}} performed
> *much* better. Examples:
> {noformat}
> # 1.6.0_17, Java HotSpot(TM) Server VM, 14.3-b01, Sun Microsystems Inc.,
> test_POP_JDK_random : 15/20 rounds, time.total: 15.61, time.warmup:
> 4.31, time.bench: 11.30, round: 0.75 [+- 0.02]
> test_POP_JDK_single : 15/20 rounds, time.total: 15.67, time.warmup:
> 4.31, time.bench: 11.36, round: 0.76 [+- 0.02]
> test_POP_BitUtil_random : 15/20 rounds, time.total: 15.55, time.warmup:
> 4.33, time.bench: 11.22, round: 0.75 [+- 0.01]
> test_POP_BitUtil_single : 15/20 rounds, time.total: 15.55, time.warmup:
> 4.31, time.bench: 11.23, round: 0.75 [+- 0.01]
> test_POP2_random : 15/20 rounds, time.total: 6.69, time.warmup:
> 1.75, time.bench: 4.94, round: 0.33 [+- 0.00]
> test_POP2_single : 15/20 rounds, time.total: 4.66, time.warmup:
> 1.22, time.bench: 3.44, round: 0.23 [+- 0.01]
> {noformat}
> Note the difference between random and single distributions -- most probably
> due to more cache hits when referring to the
> lookup table. Other VMs on this 32-bit machine:
> {noformat}
> # 1.5.0_18, Java HotSpot(TM) Server VM, 1.5.0_18-b02, Sun Microsystems Inc.,
> test_POP_JDK_random : 15/20 rounds, time.total: 20.67, time.warmup:
> 5.19, time.bench: 15.48, round: 1.03 [+- 0.01]
> test_POP_JDK_single : 15/20 rounds, time.total: 22.70, time.warmup:
> 5.63, time.bench: 17.08, round: 1.14 [+- 0.01]
> test_POP_BitUtil_random : 15/20 rounds, time.total: 22.69, time.warmup:
> 5.63, time.bench: 17.06, round: 1.14 [+- 0.01]
> test_POP_BitUtil_single : 15/20 rounds, time.total: 20.67, time.warmup:
> 5.19, time.bench: 15.48, round: 1.03 [+- 0.01]
> test_POP2_random : 15/20 rounds, time.total: 6.30, time.warmup:
> 1.63, time.bench: 4.67, round: 0.31 [+- 0.01]
> test_POP2_single : 15/20 rounds, time.total: 4.33, time.warmup:
> 1.16, time.bench: 3.17, round: 0.21 [+- 0.01]
> # 1.7.0-ea, Java HotSpot(TM) Server VM, 17.0-b06, Sun Microsystems Inc.,
> test_POP_JDK_random : 15/20 rounds, time.total: 15.28, time.warmup:
> 4.25, time.bench: 11.03, round: 0.74 [+- 0.03]
> test_POP_JDK_single : 15/20 rounds, time.total: 15.16, time.warmup:
> 4.20, time.bench: 10.95, round: 0.73 [+- 0.01]
> test_POP_BitUtil_random : 15/20 rounds, time.total: 15.12, time.warmup:
> 4.20, time.bench: 10.92, round: 0.73 [+- 0.01]
> test_POP_BitUtil_single : 15/20 rounds, time.total: 15.13, time.warmup:
> 4.25, time.bench: 10.88, round: 0.73 [+- 0.01]
> test_POP2_random : 15/20 rounds, time.total: 6.78, time.warmup:
> 1.72, time.bench: 5.06, round: 0.34 [+- 0.01]
> test_POP2_single : 15/20 rounds, time.total: 4.72, time.warmup:
> 1.20, time.bench: 3.52, round: 0.23 [+- 0.02]
> {noformat}
> On 64-bit machines, the results are nearly equal, with pop2 performing
> slightly worse on SUN's 1.6 compared to JDK and BitUtil
> (but this difference is really tiny and not present on all VMs; see IBM J9
> and SUN's 1.5 below).
> {noformat}
> # 1.6.0_16, Java HotSpot(TM) 64-Bit Server VM, 14.2-b01, Sun Microsystems
> Inc.,
> test_POP_JDK_random : 15/20 rounds, time.total: 3.27, time.warmup:
> 0.81, time.bench: 2.46, round: 0.16 [+- 0.00]
> test_POP_JDK_single : 15/20 rounds, time.total: 3.11, time.warmup:
> 0.76, time.bench: 2.34, round: 0.16 [+- 0.02]
> test_POP_BitUtil_random : 15/20 rounds, time.total: 3.27, time.warmup:
> 0.81, time.bench: 2.46, round: 0.16 [+- 0.00]
> test_POP_BitUtil_single : 15/20 rounds, time.total: 3.03, time.warmup:
> 0.77, time.bench: 2.26, round: 0.15 [+- 0.00]
> test_POP2_random : 15/20 rounds, time.total: 3.63, time.warmup:
> 0.93, time.bench: 2.70, round: 0.18 [+- 0.00]
> test_POP2_single : 15/20 rounds, time.total: 3.51, time.warmup:
> 0.89, time.bench: 2.62, round: 0.17 [+- 0.00]
> # 1.6.0, IBM J9 VM, 2.4, IBM Corporation,
> test_POP_JDK_random : 15/20 rounds, time.total: 4.80, time.warmup:
> 1.24, time.bench: 3.57, round: 0.24 [+- 0.01]
> test_POP_JDK_single : 15/20 rounds, time.total: 5.00, time.warmup:
> 1.44, time.bench: 3.56, round: 0.24 [+- 0.01]
> test_POP_BitUtil_random : 15/20 rounds, time.total: 4.81, time.warmup:
> 1.24, time.bench: 3.56, round: 0.24 [+- 0.01]
> test_POP_BitUtil_single : 15/20 rounds, time.total: 4.75, time.warmup:
> 1.19, time.bench: 3.56, round: 0.24 [+- 0.01]
> test_POP2_random : 15/20 rounds, time.total: 3.65, time.warmup:
> 0.90, time.bench: 2.76, round: 0.18 [+- 0.00]
> test_POP2_single : 15/20 rounds, time.total: 3.82, time.warmup:
> 0.93, time.bench: 2.89, round: 0.19 [+- 0.01]
> # 1.5.0_18, Java HotSpot(TM) 64-Bit Server VM, 1.5.0_18-b02, Sun Microsystems
> Inc.,
> test_POP_JDK_random : 15/20 rounds, time.total: 3.72, time.warmup:
> 0.94, time.bench: 2.78, round: 0.19 [+- 0.00]
> test_POP_JDK_single : 15/20 rounds, time.total: 5.96, time.warmup:
> 1.40, time.bench: 4.56, round: 0.30 [+- 0.00]
> test_POP_BitUtil_random : 15/20 rounds, time.total: 6.16, time.warmup:
> 1.43, time.bench: 4.73, round: 0.31 [+- 0.00]
> test_POP_BitUtil_single : 15/20 rounds, time.total: 3.62, time.warmup:
> 0.92, time.bench: 2.70, round: 0.18 [+- 0.00]
> test_POP2_random : 15/20 rounds, time.total: 3.70, time.warmup:
> 0.96, time.bench: 2.74, round: 0.18 [+- 0.00]
> test_POP2_single : 15/20 rounds, time.total: 3.57, time.warmup:
> 0.93, time.bench: 2.65, round: 0.18 [+- 0.00]
> {noformat}
> The other 64-bit machine (quad-core):
> {noformat}
> # 1.7.0-ea, Java HotSpot(TM) 64-Bit Server VM, 17.0-b06, Sun Microsystems
> Inc.,
> test_POP_JDK_random : 15/20 rounds, time.total: 2.46, time.warmup:
> 0.62, time.bench: 1.84, round: 0.12 [+- 0.00]
> test_POP_JDK_single : 15/20 rounds, time.total: 2.49, time.warmup:
> 0.62, time.bench: 1.87, round: 0.12 [+- 0.01]
> test_POP_BitUtil_random : 15/20 rounds, time.total: 2.47, time.warmup:
> 0.62, time.bench: 1.84, round: 0.12 [+- 0.00]
> test_POP_BitUtil_single : 15/20 rounds, time.total: 2.46, time.warmup:
> 0.62, time.bench: 1.84, round: 0.12 [+- 0.00]
> test_POP2_random : 15/20 rounds, time.total: 2.82, time.warmup:
> 0.71, time.bench: 2.11, round: 0.14 [+- 0.00]
> test_POP2_single : 15/20 rounds, time.total: 2.15, time.warmup:
> 0.55, time.bench: 1.61, round: 0.11 [+- 0.00]
> {noformat}
> I then replaced {{BitUtil.pop}} with {{BitUtil.pop2}} in bit-counting methods
> like xor/and/or. The results are intriguing.
> On 32-bit systems, there is a measureable gain, like here:
> {noformat}
> # 1.6.0_17, Java HotSpot(TM) Server VM, 14.3-b01, Sun Microsystems Inc.,
> test_pop_xor : 15/20 rounds, time.total: 9.78, time.warmup:
> 2.59, time.bench: 7.19, round: 0.48 [+- 0.01]
> test_pop2_hd_xor : 15/20 rounds, time.total: 8.27, time.warmup:
> 2.22, time.bench: 6.05, round: 0.40 [+- 0.01]
> # 1.7.0-ea, Java HotSpot(TM) Server VM, 17.0-b06, Sun Microsystems Inc.,
> test_pop_xor : 15/20 rounds, time.total: 9.89, time.warmup:
> 2.59, time.bench: 7.30, round: 0.49 [+- 0.02]
> test_pop2_hd_xor : 15/20 rounds, time.total: 8.20, time.warmup:
> 2.24, time.bench: 5.97, round: 0.40 [+- 0.01]
> {noformat}
> On 64-bit systems, when 64-bit values can be manipulated directly in
> registers, there was nearly no speedup or even
> a small performance penalty like in here:
> {noformat}
> # 1.7.0-ea, Java HotSpot(TM) 64-Bit Server VM, 17.0-b06, Sun Microsystems
> Inc.,
> test_pop_xor : 15/20 rounds, time.total: 1.76, time.warmup:
> 0.49, time.bench: 1.27, round: 0.09 [+- 0.00]
> test_pop2_hd_xor : 15/20 rounds, time.total: 2.06, time.warmup:
> 0.55, time.bench: 1.51, round: 0.10 [+- 0.00]
> {noformat}
> I'm guessing referencing memory on this fast processors is slower than
> manipulating registers.
> h3. {{ntz}}
> On JVMs prior to 1.7, the {{ntz} version from Lucene was much faster in my
> tests than the one from JDK,
> but it also has a greater variance depending on the input bits' distribution
> (compare the same routine
> for random and single below).
> {noformat}
> # 32-bit system;
> # 1.6.0_17, Java HotSpot(TM) Server VM, 14.3-b01, Sun Microsystems Inc.,
> test_NTZ_JDK_random : 15/20 rounds, time.total: 6.69, time.warmup:
> 1.73, time.bench: 4.95, round: 0.33 [+- 0.01]
> test_NTZ_JDK_single : 15/20 rounds, time.total: 7.59, time.warmup:
> 1.94, time.bench: 5.66, round: 0.38 [+- 0.01]
> test_NTZ_BitUtil_random : 15/20 rounds, time.total: 2.72, time.warmup:
> 0.73, time.bench: 1.98, round: 0.13 [+- 0.02]
> test_NTZ_BitUtil_single : 15/20 rounds, time.total: 5.28, time.warmup:
> 1.34, time.bench: 3.94, round: 0.26 [+- 0.02]
> test_NTZ2_BitUtil_random : 15/20 rounds, time.total: 3.06, time.warmup:
> 0.81, time.bench: 2.25, round: 0.15 [+- 0.01]
> test_NTZ2_BitUtil_single : 15/20 rounds, time.total: 5.36, time.warmup:
> 1.34, time.bench: 4.02, round: 0.27 [+- 0.01]
> test_NTZ3_BitUtil_random : 15/20 rounds, time.total: 5.80, time.warmup:
> 1.48, time.bench: 4.31, round: 0.29 [+- 0.01]
> test_NTZ3_BitUtil_single : 15/20 rounds, time.total: 6.98, time.warmup:
> 1.81, time.bench: 5.17, round: 0.34 [+- 0.01]
> # 64-bit Athlon
> # 1.6.0_16, Java HotSpot(TM) 64-Bit Server VM, 14.2-b01, Sun Microsystems
> Inc.,
> test_NTZ_JDK_random : 15/20 rounds, time.total: 4.59, time.warmup:
> 1.16, time.bench: 3.44, round: 0.23 [+- 0.00]
> test_NTZ_JDK_single : 15/20 rounds, time.total: 6.64, time.warmup:
> 1.59, time.bench: 5.04, round: 0.34 [+- 0.01]
> test_NTZ_BitUtil_random : 15/20 rounds, time.total: 2.09, time.warmup:
> 0.53, time.bench: 1.56, round: 0.10 [+- 0.00]
> test_NTZ_BitUtil_single : 15/20 rounds, time.total: 3.87, time.warmup:
> 0.98, time.bench: 2.90, round: 0.19 [+- 0.00]
> test_NTZ2_BitUtil_random : 15/20 rounds, time.total: 2.09, time.warmup:
> 0.52, time.bench: 1.57, round: 0.10 [+- 0.00]
> test_NTZ2_BitUtil_single : 15/20 rounds, time.total: 3.31, time.warmup:
> 0.84, time.bench: 2.47, round: 0.16 [+- 0.00]
> test_NTZ3_BitUtil_random : 15/20 rounds, time.total: 3.31, time.warmup:
> 0.83, time.bench: 2.48, round: 0.17 [+- 0.00]
> test_NTZ3_BitUtil_single : 15/20 rounds, time.total: 5.71, time.warmup:
> 1.39, time.bench: 4.32, round: 0.29 [+- 0.00]
> {noformat}
> But then comes the 1.7 HotSport and things change radically, on 32-bit system
> the JDK's version is much faster for nearly-empty
> {{long}} values:
> {noformat}
> # 1.7.0-ea, Java HotSpot(TM) Server VM, 17.0-b06, Sun Microsystems Inc.,
> test_NTZ_JDK_random : 15/20 rounds, time.total: 1.97, time.warmup:
> 0.61, time.bench: 1.36, round: 0.09 [+- 0.01]
> test_NTZ_JDK_single : 15/20 rounds, time.total: 2.53, time.warmup:
> 0.77, time.bench: 1.77, round: 0.12 [+- 0.01]
> test_NTZ_BitUtil_random : 15/20 rounds, time.total: 2.36, time.warmup:
> 0.66, time.bench: 1.70, round: 0.11 [+- 0.01]
> test_NTZ_BitUtil_single : 15/20 rounds, time.total: 4.50, time.warmup:
> 1.19, time.bench: 3.31, round: 0.22 [+- 0.01]
> test_NTZ2_BitUtil_random : 15/20 rounds, time.total: 3.08, time.warmup:
> 0.81, time.bench: 2.27, round: 0.15 [+- 0.01]
> test_NTZ2_BitUtil_single : 15/20 rounds, time.total: 4.97, time.warmup:
> 1.28, time.bench: 3.69, round: 0.25 [+- 0.01]
> test_NTZ3_BitUtil_random : 15/20 rounds, time.total: 5.78, time.warmup:
> 1.48, time.bench: 4.30, round: 0.29 [+- 0.01]
> test_NTZ3_BitUtil_single : 15/20 rounds, time.total: 7.77, time.warmup:
> 1.91, time.bench: 5.86, round: 0.39 [+- 0.01]
> {noformat}
> On the 64-bit quad core:
> {noformat}
> # 1.6.0_13, Java HotSpot(TM) 64-Bit Server VM, 11.3-b02, Sun Microsystems
> Inc.,
> test_NTZ_JDK_random : 15/20 rounds, time.total: 3.92, time.warmup:
> 0.97, time.bench: 2.94, round: 0.20 [+- 0.00]
> test_NTZ_JDK_single : 15/20 rounds, time.total: 3.80, time.warmup:
> 0.97, time.bench: 2.82, round: 0.19 [+- 0.00]
> test_NTZ_BitUtil_random : 15/20 rounds, time.total: 0.96, time.warmup:
> 0.25, time.bench: 0.71, round: 0.05 [+- 0.00]
> test_NTZ_BitUtil_single : 15/20 rounds, time.total: 2.74, time.warmup:
> 0.69, time.bench: 2.04, round: 0.14 [+- 0.00]
> test_NTZ2_BitUtil_random : 15/20 rounds, time.total: 1.22, time.warmup:
> 0.31, time.bench: 0.91, round: 0.06 [+- 0.00]
> test_NTZ2_BitUtil_single : 15/20 rounds, time.total: 2.18, time.warmup:
> 0.56, time.bench: 1.62, round: 0.11 [+- 0.00]
> test_NTZ3_BitUtil_random : 15/20 rounds, time.total: 2.76, time.warmup:
> 0.71, time.bench: 2.06, round: 0.14 [+- 0.00]
> test_NTZ3_BitUtil_single : 15/20 rounds, time.total: 3.47, time.warmup:
> 0.91, time.bench: 2.56, round: 0.17 [+- 0.01]
> {noformat}
> And then comes the 1.7, compare JDK's implementation with anything else
> (especially the {{time.bench}} for
> the {{single}} input data. Looks like this is hardware-accelerated.
> {noformat}
> # -server -Xbatch -Xmx1024m
> # 1.7.0-ea, Java HotSpot(TM) 64-Bit Server VM, 17.0-b06, Sun Microsystems
> Inc.,
> test_NTZ_JDK_random : 15/20 rounds, time.total: 0.79, time.warmup:
> 0.21, time.bench: 0.58, round: 0.04 [+- 0.00]
> test_NTZ_JDK_single : 15/20 rounds, time.total: 0.75, time.warmup:
> 0.20, time.bench: 0.55, round: 0.04 [+- 0.00]
> test_NTZ_BitUtil_random : 15/20 rounds, time.total: 0.98, time.warmup:
> 0.25, time.bench: 0.72, round: 0.05 [+- 0.00]
> test_NTZ_BitUtil_single : 15/20 rounds, time.total: 2.61, time.warmup:
> 0.66, time.bench: 1.95, round: 0.13 [+- 0.00]
> test_NTZ2_BitUtil_random : 15/20 rounds, time.total: 1.30, time.warmup:
> 0.33, time.bench: 0.97, round: 0.06 [+- 0.00]
> test_NTZ2_BitUtil_single : 15/20 rounds, time.total: 2.48, time.warmup:
> 0.61, time.bench: 1.88, round: 0.13 [+- 0.00]
> test_NTZ3_BitUtil_random : 15/20 rounds, time.total: 2.81, time.warmup:
> 0.70, time.bench: 2.11, round: 0.14 [+- 0.00]
> test_NTZ3_BitUtil_single : 15/20 rounds, time.total: 4.07, time.warmup:
> 1.02, time.bench: 3.05, round: 0.20 [+- 0.00]
> {noformat}
> h2. Conclusions
> It seems that any change introduced to these routines will hurt somebody in
> some configuration, so it's really hard
> for me to make choices. I would definitely opt for the precached {{pop2}}
> version on 32-bit systems as it seems to
> be always faster or equally fast compared to other bit counting options.
> {{pop2}} looked like this:
> {code}
> private static byte [] bcounts = new byte [0x10000];
> static
> {
> for (int i = 0x10000; --i >= 0;)
> bcounts[i] = (byte) Integer.bitCount(i);
> }
> public static int pop2(long v)
> {
> int t;
> return
> bcounts[(t = (int) v) & 0xffff]
> + bcounts[t >>> 16]
> + bcounts[(t = ((int) (v >>> 32))) & 0xffff]
> + bcounts[t >>> 16];
> }
> {code}
> As for the hardware-accelerated {{ntz}}, if one can detect 1.7, then using
> the JDK's version is speeding up things
> significantly. But I have not checked how this detection would affect speed
> if done at run-time (I assume a final
> static flag wouldn't cause any performance penalty) and it is definitely not
> worth replacing for folks with older
> VMs.
> h2. Raw results data.
> I will attach raw results as part of the issue if you want to draw your own
> conclusions. Didn't have access to sparc-machine
> or to any machine with the newest Intels.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]