On Aug 7, 2020, at 9:52 AM, Andrew Haley <a...@redhat.com> wrote: > > On 8/7/20 4:39 PM, David Lloyd wrote: > > > However, I'm wondering if this isn't a side effect of what appears > > to be an incorrect double-checked lock at lines 374-378 of > > ClassValue.java [1]. In order for the write to the non-volatile > > `cache` field of ClassValueMap, it is my understanding that there > > must be some acquire/release edge from where the variable is > > published to guarantee that all of the writes are visible to the > > read site, and I'm not really sure that the exit-the-constructor > > edge is going to match up with with the synchronized block which > > protects a different non-volatile field. And even if my feeling > > that this is dodgy is valid, I'm not sure whether this NPE is a > > possible outcome of that problem! > > It certainly doesn't look right to me. AFAICS this is classic broken > double-checked locking. It'd need some sort of fence after > constructing the ClassValueMap instance and before publishing it.
Y’all are calling my baby ugly but he’s really cute if you look at him from the right angle. (First, a quick question: This bug shows up on x86, right? If so, we probably are not looking at a hardware reordering problem but something stemming from Graal’s reordering of operations compared to C1 and C2, perhaps after a different set of inlining decisions uncharacteristic of C2.) The bug indeed looks like a dropped fence at the end of the constructor for ClassValueMap. (I don’t see an easier way to get a null at the NPE point.) The fence might not actually be dropped, but a write which the fence is supposed to fence might have been reordered after the fence, and after an unlock operation, allowing someone to see the initial null after the unlock. The double-check idiom breaks when the outer check (not synchronized) gets half-baked data and acts on it. This much-maligned idiom would not be broken in the present case under the assumption that a data dependency on type.classValueMap (set in initializeMap) will see a fully initialized value of ClassValueMap.cacheArray. Now we get into the fine print, and I agree there is a bug here. The easy fix is to repair the logic under which that ugly everybody’s-hating-on-it double check idiom would be in fact correct. The fine print requires a couple different things that are not quite fulfilled by the present code, and Graal has either reordered the memory accesses to expose the problem, or it (itself) has a complementary bug. (Or it has an unrelated bug, and you people are *staring* at my baby!) First, as Paul noted, you need a final variable in your class to get the benefit of a fence at the end of the constructor. Oops #1: ClassValueMap doesn’t have *any* finals. That’s awkward. Two fixes for that: (a) add a dummy final, and (b) add a real fence in the constructor. For better luck, maybe put the fence at the end of sizeCache. On machines (not x86) which need fences *before* final reads, another explicit fence should be placed before the unsynchronized read (or else inside the getCache method). By the letter of the JMM, the helpful effects of the fence at the end of the constructor are only guaranteed for references which depend on final variables set by that constructor, because only those final variables get a “freeze” operation which stabilizes them (and values dependently loaded via them). Throwing in a dummy final is therefore not guaranteed to work. But throwing in a fence will work. One downside of the fence is that the JMM is silent about fences, but the JMM is widely recognized as a partial story, not the full or final story. (Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.) Here’s a bit of explanatory (non-normative) text from the draft POPL paper for JMM, where the parenthetic comment indicates (I think) the sort of thing that is happening here: > Let us now assume that the acquire and release do happen. As long as > these actions take place after object has been constructed (and there > is no code motion around the end of the constructor), the diffs that > the second processor acquires are guaranteed to reflect the correctly > constructed object. Basically, the synchronization statement is expected to do a release *after* the non-null value is stored. It looks like this is failing due to a lost and/or reordered fence. Second, David says: > I’m not really sure that the exit-the-constructor edge is going to match > up with with the synchronized block which protects a different > non-volatile field. By the letter of the JMM (AFAIUI), this code is coloring way outside the lines. (Yes, I admit it.) The problem is pretty fundamental. I went and swapped in some JMM spec just now (don’t have it in short term memory; go figure), and here’s what I re-learned. In JMM terms, a “release” is the earlier end of any edge in relation called “synchronizes-with”. An “acquire” is the latter end of such an edge. In the JMM this relation applies only to lock, unlock, and volatile reads and writes. Your guess is as good as mine whether other operations (CAS, fences, etc.) participate; the JMM is silent. Crucially, acquires and release do not touch plain fields. This is counter-intuitive for those of us who like to reason with fences. The JMM prevents regular writes from floating past what we like to think of as “release points” (unlocks) by appealing to processor order inside lock/unlock pairs, and also by appealing to global ordering of multiple lock/unlock pairs (or stuff with volatiles and other corner cases which we will neglect here). In order to work right, a normal write has to come before a release *and* a matching normal read has to come after an acquire, and such an acquire is nothing other than the lock of a later lock/unlock pair, typically in another thread. So the rules which reliably connect normal writes to normal reads in other threads work differently from acquire fences and release fences as we (or just I?) usually think of them. Yes, there are “acquires” and “releases” in the JMM. No, they are not primitives but rather descriptions of the way the happens-before relation is constrained by (among other things) lock and unlock actions. On x86, you don’t have to worry about acquires; you just set up the right release fences. In the JMM, there’s no way to get either an acquire or a release without a synchronized statement (or other fancy stuff like volatiles). That’s why double-check is broken in the pure JMM, and why it can be repaired if you add some (non-JMM) fences. So why is this showing up suddenly with Graal? Possibly it’s got a really aggressive interpretation of the JMM, relative to the standard HotSpot JITs. Possibly it’s got a bug. Graal might be allowing the initialization of ClassValueMap.cacheArray to float down past publication of the ClassValueMap on type.classValueMap (in initializeMap). Less likely, it is allowing the initialization to totally escape the lock/unlock pair, something the JMM might allow but the Cookbook advises against (in its “Can Reorder” table). Perhaps Graal is modeling the “freeze” operations on finals more accurately (and the Cookbook gives instructions for this). That would allow the ClassValueMap to correctly initialize its frozen finals, but not (unfortunately) the not-frozen cacheArray field. It appears that the new JIT on the block is exposing my neglect for my poor baby, in not putting the right fences around its playpen. Bottom line: Add a release fence before type.cVM is set, and add a (no-op on x86) acquire fence in getCache. Quick test: Add a dummy final to CVM, to see if that makes the bug let go. — John