On Thu, 28 Oct 2021 02:32:41 GMT, David Holmes <dhol...@openjdk.org> wrote:
>> `Unsafe.storeStoreFence` currently delegates to stronger >> `Unsafe.storeFence`. We can teach compilers to map this directly to already >> existing rules that handle `MemBarStoreStore`. Like explicit >> `LoadFence`/`StoreFence`, we introduce the special node to differentiate >> explicit fence and implicit store-store barriers. This node is usually used >> to simulate safe `final`-field like constructions in special JDK classes, >> like `ConstantCallSite` and friends. >> >> Motivational performance difference on benchmarks from JDK-8276054 on ARM32: >> >> >> Benchmark Mode Cnt Score Error Units >> Multiple.plain avgt 3 2.669 ± 0.004 ns/op >> Multiple.release avgt 3 16.688 ± 0.057 ns/op >> Multiple.storeStore avgt 3 14.021 ± 0.144 ns/op // Better >> >> MultipleWithLoads.plain avgt 3 4.672 ± 0.053 ns/op >> MultipleWithLoads.release avgt 3 16.689 ± 0.044 ns/op >> MultipleWithLoads.storeStore avgt 3 14.012 ± 0.010 ns/op // Better >> >> MultipleWithStores.plain avgt 3 14.687 ± 0.009 ns/op >> MultipleWithStores.release avgt 3 45.393 ± 0.192 ns/op >> MultipleWithStores.storeStore avgt 3 38.048 ± 0.033 ns/op // Better >> >> Publishing.plain avgt 3 27.079 ± 0.201 ns/op >> Publishing.release avgt 3 27.088 ± 0.241 ns/op >> Publishing.storeStore avgt 3 27.009 ± 0.259 ns/op // Within >> error, hidden by allocation >> >> Single.plain avgt 3 2.670 ± 0.002 ns/op >> Single.releaseFence avgt 3 6.675 ± 0.001 ns/op >> Single.storeStoreFence avgt 3 8.012 ± 0.027 ns/op // Worse, >> seems to be ARM32 implementation artifact >> >> >> As expected, this does not affect x86_64 at all, because both `release` and >> `storeStore` are effectively no-ops, only affecting compiler optimizations: >> >> >> Benchmark Mode Cnt Score Error Units >> >> Multiple.plain avgt 3 0.406 ± 0.002 ns/op >> Multiple.release avgt 3 0.409 ± 0.018 ns/op >> Multiple.storeStore avgt 3 0.406 ± 0.001 ns/op >> >> MultipleWithLoads.plain avgt 3 4.328 ± 0.006 ns/op >> MultipleWithLoads.release avgt 3 4.600 ± 0.014 ns/op >> MultipleWithLoads.storeStore avgt 3 4.602 ± 0.006 ns/op >> >> MultipleWithStores.plain avgt 3 0.812 ± 0.001 ns/op >> MultipleWithStores.release avgt 3 0.812 ± 0.002 ns/op >> MultipleWithStores.storeStore avgt 3 0.812 ± 0.002 ns/op >> >> Publishing.plain avgt 3 6.370 ± 0.059 ns/op >> Publishing.release avgt 3 6.358 ± 0.436 ns/op >> Publishing.storeStore avgt 3 6.367 ± 0.054 ns/op >> >> Single.plain avgt 3 0.407 ± 0.039 ns/op >> Single.releaseFence avgt 3 0.406 ± 0.001 ns/op >> Single.storeStoreFence avgt 3 0.406 ± 0.001 ns/op >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > src/java.base/share/classes/jdk/internal/misc/Unsafe.java line 3449: > >> 3447: public final void storeStoreFence() { >> 3448: // Without the special intrinsic, default to a stronger >> storeFence, >> 3449: // which is already intrinsified. > > Not clear me to me why we need to retain this fallback? Something should happen when intrinsic is disabled. Other fences have native `Unsafe_{Load|Store|Full}Fence` entry points for this. We can, technically, do the same here, but I see no need. Instead, we can fall back to the already implemented stronger intrinsic. ------------- PR: https://git.openjdk.java.net/jdk/pull/6136