`Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. 
We can teach compilers to map this directly to already existing rules that 
handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we introduce 
the special node to differentiate explicit fence and implicit store-store 
barriers. This node is usually used to simulate safe `final`-field like 
constructions in special JDK classes, like `ConstantCallSite` and friends.

Motivational performance difference on benchmarks from JDK-8276054 on ARM32:


Benchmark                      Mode  Cnt   Score    Error  Units
Multiple.plain                 avgt    3   2.669 ±  0.004  ns/op
Multiple.release               avgt    3  16.688 ±  0.057  ns/op
Multiple.storeStore            avgt    3  14.021 ±  0.144  ns/op // Better

MultipleWithLoads.plain        avgt    3   4.672 ±  0.053  ns/op
MultipleWithLoads.release      avgt    3  16.689 ±  0.044  ns/op
MultipleWithLoads.storeStore   avgt    3  14.012 ±  0.010  ns/op // Better

MultipleWithStores.plain       avgt    3  14.687 ±  0.009  ns/op
MultipleWithStores.release     avgt    3  45.393 ±  0.192  ns/op
MultipleWithStores.storeStore  avgt    3  38.048 ±  0.033  ns/op // Better

Publishing.plain               avgt    3  27.079 ±  0.201  ns/op
Publishing.release             avgt    3  27.088 ±  0.241  ns/op
Publishing.storeStore          avgt    3  27.009 ±  0.259  ns/op // Within 
error, hidden by allocation

Single.plain                   avgt    3   2.670 ± 0.002  ns/op
Single.releaseFence            avgt    3   6.675 ± 0.001  ns/op
Single.storeStoreFence         avgt    3   8.012 ± 0.027  ns/op  // Worse, 
seems to be ARM32 implementation artifact


As expected, this does not affect x86_64 at all, because both `release` and 
`storeStore` are effectively no-ops, only affecting compiler optimizations:


Benchmark                      Mode  Cnt  Score   Error  Units

Multiple.plain                 avgt    3  0.406 ± 0.002  ns/op
Multiple.release               avgt    3  0.409 ± 0.018  ns/op
Multiple.storeStore            avgt    3  0.406 ± 0.001  ns/op

MultipleWithLoads.plain        avgt    3  4.328 ± 0.006  ns/op
MultipleWithLoads.release      avgt    3  4.600 ± 0.014  ns/op
MultipleWithLoads.storeStore   avgt    3  4.602 ± 0.006  ns/op

MultipleWithStores.plain       avgt    3  0.812 ± 0.001  ns/op
MultipleWithStores.release     avgt    3  0.812 ± 0.002  ns/op
MultipleWithStores.storeStore  avgt    3  0.812 ± 0.002  ns/op

Publishing.plain               avgt    3  6.370 ± 0.059  ns/op
Publishing.release             avgt    3  6.358 ± 0.436  ns/op
Publishing.storeStore          avgt    3  6.367 ± 0.054  ns/op

Single.plain                   avgt    3  0.407 ± 0.039  ns/op
Single.releaseFence            avgt    3  0.406 ± 0.001  ns/op
Single.storeStoreFence         avgt    3  0.406 ± 0.001  ns/op


Additional testing:
 - [x] Linux x86_64 fastdebug `tier1`

-------------

Commit messages:
 - Formatting
 - Little cleanup
 - 8252990: Intrinsify Unsafe.storeStoreFence

Changes: https://git.openjdk.java.net/jdk/pull/6136/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6136&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8252990
  Stats: 39 lines in 16 files changed: 33 ins; 5 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6136.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6136/head:pull/6136

PR: https://git.openjdk.java.net/jdk/pull/6136

Reply via email to