On 03/22/2014 02:01 AM, Brian Burkhalter wrote:
On Mar 20, 2014, at 12:49 AM, Aleksey Shipilev
<aleksey.shipi...@oracle.com <mailto:aleksey.shipi...@oracle.com>> wrote:
On 03/20/2014 11:06 AM, Peter Levart wrote:
I was thinking about last night, for question: "Why is this
double-checked non-volatile-then-volatile trick not any faster than pure
volatile variant even on ARM platform where volatile read should have
some penalty compared to normal read?", might be in the fact that
Raspberry Pi is a single-core/single-thread "machine". Would anyone with
JVM JIT compiler expertise care to share some insight? I suspect that on
such platform, the compiler optimizes volatile accesses so that they are
performed without otherwise necessary memory fences...
Yes, at least C2 is known to not emit memory fences on uniprocessor
machines. You need to have a multicore ARM. If you are still interested,
contact me privately and I can arrange the access to my personal
quad-core Cortex-A9.
This is all very interesting but I would like to have closure so to
speak on the patches that I posted.
Hi Brian,
I think the questions still unresolved are:
1) Is a change of toString() necessary?
I can't speak of that since I don't have empirical data in what
proportion of code it would be beneficial. I can imagine situations
where returned strings are used as keys in HashMap and String's equals
is optimized for same-instance comparison, but I don't know if such
situations happen in practice...
2) If #1 is “yes” then which version of the patch? (I think the most
recent)
Aleksey Shipilev kindly let me access it's quad-code ARM "Big Iron" and
before the connection to it broke yesterday, I managed to get the
results of a single-threaded run of the microbenchmark. While
differences were not spectacular, they were measurable (sorry I haven't
saved the results). The variant with volatile stringCache field and CAS
was a little slower than the double-checked
nonvolatile-then-volatile-read-and-CAS trick.
It's also important that the fast-path method does not exceed the
maximum bytecode size for inlining (35 bytes by default, I think) so the
variant with two methods toString/toStringSlow is necessary to avoid
performance regression.
3) Are the other changes OK which are unrelated to toString()?
Looks good. Just a nit. In the following method:
3726 private static void matchScale(BigDecimal[] val) {
3727 if (val[0].scale == val[1].scale) {
3728 // return
3729 } else if (val[0].scale < val[1].scale) {
3730 val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
3731 } else if (val[1].scale < val[0].scale) {
3732 val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
3733 }
3734 }
One of 3 ifs is superfluous. Either the 1st one:
private static void matchScale(BigDecimal[] val) {
*/* if (val[0].scale == val[1].scale) {
// return
} else */* if (val[0].scale < val[1].scale) {
val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
} else if (val[1].scale < val[0].scale) {
val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
}
}
...or the last one:
private static void matchScale(BigDecimal[] val) {
if (val[0].scale == val[1].scale) {
// return
} else if (val[0].scale < val[1].scale) {
val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
} else*/* if (val[1].scale < val[0].scale) */* {
val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
}
}
Regards, Peter
Thanks,
Brian