On 03/22/2014 02:01 AM, Brian Burkhalter wrote:
On Mar 20, 2014, at 12:49 AM, Aleksey Shipilev <aleksey.shipi...@oracle.com <mailto:aleksey.shipi...@oracle.com>> wrote:

On 03/20/2014 11:06 AM, Peter Levart wrote:
I was thinking about last night, for question: "Why is this
double-checked non-volatile-then-volatile trick not any faster than pure
volatile variant even on ARM platform where volatile read should have
some penalty compared to normal read?", might be in the fact that
Raspberry Pi is a single-core/single-thread "machine". Would anyone with
JVM JIT compiler expertise care to share some insight? I suspect that on
such platform, the compiler optimizes volatile accesses so that they are
performed without otherwise necessary memory fences...

Yes, at least C2 is known to not emit memory fences on uniprocessor
machines. You need to have a multicore ARM. If you are still interested,
contact me privately and I can arrange the access to my personal
quad-core Cortex-A9.

This is all very interesting but I would like to have closure so to speak on the patches that I posted.

Hi Brian,

I think the questions still unresolved are:

1) Is a change of toString() necessary?

I can't speak of that since I don't have empirical data in what proportion of code it would be beneficial. I can imagine situations where returned strings are used as keys in HashMap and String's equals is optimized for same-instance comparison, but I don't know if such situations happen in practice...

2) If #1 is “yes” then which version of the patch? (I think the most recent)

Aleksey Shipilev kindly let me access it's quad-code ARM "Big Iron" and before the connection to it broke yesterday, I managed to get the results of a single-threaded run of the microbenchmark. While differences were not spectacular, they were measurable (sorry I haven't saved the results). The variant with volatile stringCache field and CAS was a little slower than the double-checked nonvolatile-then-volatile-read-and-CAS trick.

It's also important that the fast-path method does not exceed the maximum bytecode size for inlining (35 bytes by default, I think) so the variant with two methods toString/toStringSlow is necessary to avoid performance regression.

3) Are the other changes OK which are unrelated to toString()?

Looks good. Just a nit. In the following method:

3726     private static void matchScale(BigDecimal[] val) {
3727         if (val[0].scale == val[1].scale) {
3728             // return
3729         } else if (val[0].scale < val[1].scale) {
3730             val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
3731         } else if (val[1].scale < val[0].scale) {
3732             val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
3733         }
3734     }


One of 3 ifs is superfluous. Either the 1st one:

    private static void matchScale(BigDecimal[] val) {
        */* if (val[0].scale == val[1].scale) {
            // return
        } else */*  if (val[0].scale < val[1].scale) {
            val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
        } else if (val[1].scale < val[0].scale) {
            val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
        }
    }


...or the last one:

    private static void matchScale(BigDecimal[] val) {
        if (val[0].scale == val[1].scale) {
            // return
        } else if (val[0].scale < val[1].scale) {
            val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
        } else*/* if (val[1].scale < val[0].scale) */*  {
            val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
        }
    }



Regards, Peter


Thanks,

Brian

Reply via email to