On Thu, 28 Oct 2021 08:47:31 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:

> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for 
> `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) 
> to call to runtime for a single memory barrier. We can simplify the native 
> `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` 
> intrinsics are not available. This would be similar to what 
> `Unsafe.{loadLoad|storeStore}Fences` do. 
> 
> This is the behavior of these intrinsics now, on x86_64, using benchmarks 
> from JDK-8276054:
> 
> 
> Benchmark          Mode  Cnt  Score   Error  Units
> 
> # Default
> Single.acquire     avgt    3   0.407 ± 0.060  ns/op
> Single.full        avgt    3   4.693 ± 0.005  ns/op
> Single.loadLoad    avgt    3   0.415 ± 0.095  ns/op
> Single.plain       avgt    3   0.406 ± 0.002  ns/op
> Single.release     avgt    3   0.408 ± 0.047  ns/op
> Single.storeStore  avgt    3   0.408 ± 0.043  ns/op
> 
> # -XX:DisableIntrinsic=_storeFence
> Single.acquire     avgt    3   0.408 ± 0.016  ns/op
> Single.full        avgt    3   4.694 ± 0.002  ns/op
> Single.loadLoad    avgt    3   0.406 ± 0.002  ns/op
> Single.plain       avgt    3   0.406 ± 0.001  ns/op
> Single.release     avgt    3   4.694 ± 0.003  ns/op <--- upgraded to full
> Single.storeStore  avgt    3   4.690 ± 0.005  ns/op <--- upgraded to full
> 
> # -XX:DisableIntrinsic=_loadFence
> Single.acquire     avgt    3   4.691 ± 0.001  ns/op <--- upgraded to full
> Single.full        avgt    3   4.693 ± 0.009  ns/op
> Single.loadLoad    avgt    3   4.693 ± 0.013  ns/op <--- upgraded to full
> Single.plain       avgt    3   0.408 ± 0.072  ns/op
> Single.release     avgt    3   0.415 ± 0.016  ns/op
> Single.storeStore  avgt    3   0.416 ± 0.041  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence
> Single.acquire     avgt    3   0.406 ± 0.014  ns/op
> Single.full        avgt    3  15.836 ± 0.151  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ± 0.001  ns/op
> Single.plain       avgt    3   0.426 ± 0.361  ns/op
> Single.release     avgt    3   0.407 ± 0.021  ns/op
> Single.storeStore  avgt    3   0.410 ± 0.061  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence
> Single.acquire     avgt    3  15.822 ± 0.282  ns/op <--- upgraded, calls 
> runtime
> Single.full        avgt    3  15.851 ± 0.127  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.829 ± 0.045  ns/op <--- upgraded, calls 
> runtime
> Single.plain       avgt    3   0.406 ± 0.001  ns/op
> Single.release     avgt    3   0.414 ± 0.156  ns/op
> Single.storeStore  avgt    3   0.422 ± 0.452  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_storeFence
> Single.acquire     avgt    3   0.407 ± 0.016  ns/op
> Single.full        avgt    3  15.347 ± 6.783  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ± 0.001  ns/op
> Single.plain       avgt    3   0.406 ± 0.002  ns/op 
> Single.release     avgt    3  15.828 ± 0.019  ns/op <--- upgraded, calls 
> runtime
> Single.storeStore  avgt    3  15.834 ± 0.045  ns/op <--- upgraded, calls 
> runtime
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
> Single.acquire     avgt    3  15.838 ± 0.030  ns/op <--- upgraded, calls 
> runtime
> Single.full        avgt    3  15.854 ± 0.277  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.826 ± 0.160  ns/op <--- upgraded, calls 
> runtime
> Single.plain       avgt    3   0.406 ± 0.003  ns/op
> Single.release     avgt    3  15.838 ± 0.019  ns/op <--- upgraded, calls 
> runtime
> Single.storeStore  avgt    3  15.844 ± 0.104  ns/op <--- upgraded, calls 
> runtime
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`

I'm not quite seeing the motivation here. Your claim is that the non-intrinsic 
implementations involve a native call and so that is too expensive; yet the new 
code still relies on the fullFence being intrinsified else it is still a native 
call and a heavier barrier. If these fences were intrinisified piecemeal then 
perhaps this is an issue on some platform, but is that really the case? If you 
intrinsified one wouldn't you intrinsify all?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

Reply via email to