Hi Aleksey and Hans, Thank you for your comments. I'll try to see how much Unsafe approach improves performance.
I'm now thinking the approach to use final that Aleksey mentioned in the first reply is a good one. I checked the JIT generated code. It puts MEMBAR-store-store and MEMBAR-release before storing ClassDataSlot array to the final field "slots", and there is no MEMBAR when reading dataLayout or dataLayout.slots. Since dataLayout is heavily read than written, I think it is preferable if we can put all overhead into the writing side. But I'll see how performance changes with lwsync on reading it. private DataLayout dataLayout; /** * Class to ensure elements of ClassDataSlot[] are visible to other * threads. The "final" qualifier of the variable slots is necessary. */ private static class DataLayout { final ClassDataSlot[] slots; DataLayout(ClassDataSlot[] s) { slots = s; } } ClassDataSlot[] getClassDataLayout() throws InvalidClassException { // REMIND: synchronize instead of relying on volatile? if (dataLayout == null) { ClassDataSlot[] slots = getClassDataLayout0(); dataLayout = new DataLayout(slots); } return dataLayout.slots; } Regards, Ogata From: Aleksey Shipilev <sh...@redhat.com> To: Hans Boehm <hbo...@google.com> Cc: Kazunori Ogata <oga...@jp.ibm.com>, core-libs-dev <core-libs-dev@openjdk.java.net> Date: 2017/09/01 15:14 Subject: Re: RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout() On 08/31/2017 09:39 PM, Hans Boehm wrote: >> I guess you can make VarHandle.fullFence() between getClassDataLayout0() and storing it to the >> non-volatile field... > > ... with the usual warning about this sort of thing: > > According to the JMM, it's not guaranteed to work, because the reader-side guarantees are not > there. In practice, you're relying on dependency-based ordering, which the compiler is currently > unlikely to break in this case. But future implementations may. Right. > I presume the real concern here is the cost on the reader side? Presumably that could be reduced > with a VarHandle getAcquire(). I believe that eliminates the heavy-weight sync, and just keeps > an lwsync. Imperfect, but better. Oh right! This is exactly why acq/rel exist. Since OSC is a heavily used class, the Unsafe counterparts might be better: private static final Unsafe U = ...; private static final long CDS_OFFSET = U.objectFieldOffset(...); private volatile ClassDataSlot[] dataLayout; // volatile for safety of naked reads ClassDataSlot[] getClassDataLayout() throws InvalidClassException { ClassDataSlot[] slots = U.getObjectAcquire(this, CDS_OFFSET); if (slots == null) { slots = getClassDataLayout0(); U.putObjectRelease(this, CDS_OFFSET, slots); } return slots; } Ogata, please try if that indeed helps on POWER? Thanks, -Aleksey [attachment "signature.asc" deleted by Kazunori Ogata/Japan/IBM]