> On 24 May 2016, at 21:29, Aleksey Shipilev <[email protected]> > wrote: > > On 05/24/2016 05:43 AM, John Rose wrote: >> On May 23, 2016, at 4:20 PM, Martin Buchholz <[email protected] >> <mailto:[email protected]>> wrote: >>> >>> As I said in a previous message, you can implement subword CAS using >>> fullword CAS in a loop. >>> >>> cas8bit(expect, update) { >>> for (;;) { >>> fullword = atomicRead32() >>> if ((fullword &0xff) != expect) return false; >>> if (cas32(fullword, (fullword & ~0xff) | update) return true; >>> } >>> } > > Yes, stupid me! I was under impression that loops are no-no to emulate > strong CAS. But we do loops already with LL/SC…
Indeed, doh! Martin, many thanks for persisting with this. > >> Yes, that's the "artisanal" version I would reach for. >> It doesn't scale well if there is unrelated activity on nearby bytes. > > Okay, we are exploring it here: > https://bugs.openjdk.java.net/browse/JDK-8157726 > > I was able to intrinsify subword accesses on x86_64, and their > performance is on par with int versions. Plain Martin-style Java loops > are around 2x slower than direct intrinsics in a few basic tests (I > expect them to be even slower on contended cases and/or non-x86 > platforms). But first, we need to hook them up to VarHandles (in > progress now). > Nice work! This is looking very promising on x86. Paul.
