Re: RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()

Peter Levart Sun, 17 Sep 2017 13:53:24 -0700

Hi,

On 09/15/17 19:54, Hans Boehm wrote:

The problem occurs if this is transformed (by hardware or compiler) to
1196 ClassDataSlot[] getClassDataLayout() throwsInvalidClassException {
1197         // REMIND: synchronize instead of relying on fullFence()?
                 <prefetch dataLayout>
1198         ClassDataSlot[] slots = DATA_LAYOUT_GUESS;
1199         if (slots == null) {
                     if (dataLayout != DATA_LAYOUT_GUESS) <recover>
1200             slots = getClassDataLayout0();
1201             VarHandle.fullFence();
1202             dataLayout = slots;
1203         }
1204b       tmp = slots[17];
1204.5      if (dataLayout != DATA_LAYOUT_GUESS) <recover>
1205     ...
(This is only an illustration. If the problem were to occur in reallife, it would probably occur as aresult of a different optimization. DEC Alpha allowed this sort ofthing for entirely different reasons.)
Observe that
(1) This transformation is allowed by the Java memory model, sincedataLayout is not a final field.(2) This code breaks if another thread runs all of the initializationcode, including the code that setsslots[17] and the code that sets dataLayout, between 1204b and 1204.5,but the check in1204.5 still succeeds (because we guessed well). tmp will contain thepre-initialization value of slots[17].
The fence is not executed by the reading thread, and has no impact onordering within the reading thread.
C++ fences have no effect unless they are paired with another fence orordered atomic operation inthe other thread involved in the communication. I think that is thecurrent intent for Java as well.


Well, in that case, it's better to stick with final fields...

@Ogata

You said you implemented 4 variants:

On 09/04/17 07:20, Kazunori Ogata wrote:

1) Put VarHandle.fullFence() between initialization of ClassDataSlot[] and
writing the reference to non-volatile dataLayout.
   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-fence/

2) Use Unsafe.getObjectAcquire() and Unsafe.putObjectRelease() for
reading/writing dataLayout.
   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-unsafe/

3) Put reference to ClassDataSlot[] into a final field of an object and
store the object to non-volatile dataLayout.  Every invocation of
getDataLayout() reads the final field needs to deference the object
pointer.
   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final/

4) Put reference to ClassDataSlot[] into a final field of an object, read
the final field immediately after the object creation, and store it to
non-volatile dataLayout.  I think this is also correct based on the
semantics of final fields and data dependency.
   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final2/


The performance improvements were:

1) +3.5%
2) +1.1%
3) +2.2%
4) +3.4%

The 1st and 4th are not correct as we have established. The 3rd ispromising, but does not have the most speed improvement. Perhaps becauseof extra de-referencing.

What if 'dataLayout' was not an array of ClassDataSlot records, each ofthem containing a reference to an ObjectStreamClass and a boolean, butan object containing two arrays:


    static class ClassDataLayout {
        final ObjectStreamClass[] descs;
        final boolean[] hasDatas;
    }

Such object could be "unsafely" published. By eliminating theintermediate ClassDataSlot object, number of de-references should bekept down.


Here's a prototype:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8187033_ObjectStreamClass.dataLayout/webrev.01/

Could you give it a try in your benchmark and compare it with your lastapproach (with fullFence)?


Regards, Peter

Re: RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()

Reply via email to