Ok mystery solved, the Put thread was stuck in a wait() loop inside HRegion which catches InterruptedException, eats it then goes back to waiting for a condition that will never clear (memstore too big).
Attempted solution is to bring up the done flag, flush the cache, then wait for the put thread to join. The problem manifests itself on Linux with JDK 1.6.0_19. In the end it wasnt a JDK bug so. Having said that, I think it's good for people to read this: http://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html Which explains how volatile works. I had some mistaken impressions about how it works, and more importantly I was confused about how monitor blocks and field read/writes work. In short: if you use a monitor block , you dont need volatile. -ryan On Fri, Apr 16, 2010 at 12:25 AM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > I think I've seen stuff like that too when writing the replication > code, and it only seems to happen on OSX. > > @Ryan, did you try running the same test a couple of times on a linux box? > > J-D > > On Fri, Apr 16, 2010 at 2:29 AM, Ryan Rawson <ryano...@gmail.com> wrote: >> At least it's in test code right? >> >> *sigh* >> >> On Thu, Apr 15, 2010 at 5:27 PM, Todd Lipcon <t...@cloudera.com> wrote: >>> On Thu, Apr 15, 2010 at 5:26 PM, Ryan Rawson <ryano...@gmail.com> wrote: >>> >>>> Looking at the code to AtomicBoolean it uses an atomic int to >>>> accomplish it's task on OSX. >>>> >>>> So just what the heck is going on here? >>>> >>>> >>> I think you're fooling yourself, and the bug isn't gone, just hiding :) >>> >>> -Todd >>> >>> >>>> On Thu, Apr 15, 2010 at 5:24 PM, Ryan Rawson <ryano...@gmail.com> wrote: >>>> > I doubt it's case #2, there is a lot of complex code that runs between >>>> > putThread.start() and putThread.done(). >>>> > >>>> > In terms of JVMs, I'm using Java 6 on OSX x64. HBase effectively >>>> > requires Java 6 (and if we dont explicitly require it, we should) and >>>> > it also specifically cannot use certain broken JVM pushes (eg: >>>> > jdk6u18) so much that we are adding in code to prevent ourselves from >>>> > running on it and warning the user. >>>> > >>>> > But just for a moment, I think it's inappropriate for the JVM to be >>>> > specifying caching or non-caching of variables in the systems cache. >>>> > That is way too much abstraction leakage up to the language level. >>>> > Most SMP systems have cache coherency control that allow you to read >>>> > from cache yet get invalidations when other processors (on other dies) >>>> > write to that memory entry. >>>> > >>>> > But nevertheless, the problem no longer exists with AtomicBoolean :-) >>>> > >>>> > >>>> > >>>> > On Thu, Apr 15, 2010 at 5:05 PM, Paul Cowan <co...@aconex.com> wrote: >>>> >> On -9/01/37 05:59, Ryan Rawson wrote: >>>> >>> >>>> >>> So the previous use of volatile for a boolean seems like a textbook >>>> >>> case, but the situation i discovered was pretty clear cut. I have no >>>> >>> other explanation than a highly delayed volatile read (which are >>>> >>> allowed). >>>> >> >>>> >> I don't see that they are allowed, actually. >>>> >> >>>> >> Section 17.4.5 of the JLS says that: >>>> >> >>>> >>> * An unlock on a monitor happens-before every subsequent lock on that >>>> >>> monitor. >>>> >>> * A write to a volatile field (§8.3.1.4) happens-before every >>>> subsequent >>>> >>> read of that field. >>>> >> >>>> >> IOW, the situations (unlock-then-lock) and (volatile-write then >>>> >> volatile-read) have the same visibility guarantees. >>>> >> >>>> >> Section 8.3.1.4 says: >>>> >> >>>> >>> A field may be declared volatile, in which case the Java memory model >>>> >>> (§17) ensures that all threads see a consistent value for the >>>> variable. >>>> >> >>>> >> In your case, the thread calling done() is not seeing the same value as >>>> the >>>> >> thread calling run(), which is not consistent. >>>> >> >>>> >> And for good measure Java Concurrency in Practice makes it much more >>>> >> explicit (emphasis mine): >>>> >> >>>> >>> Volatile variables are not cached in registers or in caches where they >>>> are >>>> >>> hidden from other processors, so *a read of a volatile variable always >>>> >>> returns the most recent write by any thread*. >>>> >> >>>> >> And finally, on changing to an AtomicBoolean fixing the problem, JCIP >>>> says: >>>> >> >>>> >>> Atomic variables offer the same memory semantics as volatile variables >>>> >> >>>> >> So this doesn't really make sense either. >>>> >> >>>> >> All that's a long way of saying that the only ways I can see your >>>> situation >>>> >> happening are: >>>> >> >>>> >> * pre-Java-1.5 (and hence pre-JSR-133) JVM >>>> >> * JVM with a bug >>>> >> * ordering is not as you expect, i.e. the actual chronological order is >>>> not: >>>> >> >>>> >> THREAD 1 THREAD 2 >>>> >> spawn new thread >>>> >> run() >>>> >> done() >>>> >> join() >>>> >> >>>> >> but rather: >>>> >> >>>> >> THREAD 1 THREAD 2 >>>> >> spawn new thread >>>> >> done() >>>> >> run() >>>> >> join() >>>> >> >>>> >> in which case the set of run to false at the start of run() overwrites >>>> the >>>> >> set of it to true at the start of done(), and you're in for infinite >>>> loop >>>> >> fun. >>>> >> >>>> >> Cheers, >>>> >> >>>> >> Paul >>>> >> >>>> > >>>> >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >