On 05/24/2016 05:43 AM, John Rose wrote: > On May 23, 2016, at 4:20 PM, Martin Buchholz <[email protected] > <mailto:[email protected]>> wrote: >> >> As I said in a previous message, you can implement subword CAS using >> fullword CAS in a loop. >> >> cas8bit(expect, update) { >> for (;;) { >> fullword = atomicRead32() >> if ((fullword &0xff) != expect) return false; >> if (cas32(fullword, (fullword & ~0xff) | update) return true; >> } >> }
Yes, stupid me! I was under impression that loops are no-no to emulate strong CAS. But we do loops already with LL/SC... > Yes, that's the "artisanal" version I would reach for. > It doesn't scale well if there is unrelated activity on nearby bytes. Okay, we are exploring it here: https://bugs.openjdk.java.net/browse/JDK-8157726 I was able to intrinsify subword accesses on x86_64, and their performance is on par with int versions. Plain Martin-style Java loops are around 2x slower than direct intrinsics in a few basic tests (I expect them to be even slower on contended cases and/or non-x86 platforms). But first, we need to hook them up to VarHandles (in progress now). Thanks, -Aleksey
