Thank you for writing the list Kevin. On 1., below, its not a deadlock but effectively the same thing, an infinite loop. We will be cutting a 0.20.5RC pretty quickly because of it. At least one other on this list has seen the issue too and awarring up on IRC.
2. might just be just extra logging that happens in 0.20.4 that wasn't in 0.20.3. If you get a chance, can you file an issue and attach some logs and we'll take a look. On 3., I know you've been running w/ INFO-level logs but do you have the logs from the time when this was happening? (We've seen something like this happen in the past when a flapping DNS but 0.20.4 has a fix for this condition so it must be something else going on). Sorry for inconvenience caused installing 0.20.4. This release went through 5RCs -- the most we've ever done -- and was +1'd by at least four different committers. It looks like we have some work to do on our verification of RCs going forward. Yours, St.Ack On Fri, May 14, 2010 at 5:30 PM, Kevin Peterson <kevin...@gmail.com> wrote: > It's 5pm Friday, so I'm not going into a lot of detail, but we've also seen > problems with 0.20.4. Specifically: > > 1. possible deadlock HBASE-2545 > 2. Regions not getting flushed with message that it exceeds max number of > store files putting it back on the flush queue. > 3. More regions assigned to RS than present in META (possibly assigned to > multiple RS) > > We've rolled back to 0.20.3, preferring to bear those ills we have than fly > to others we know not of. YMMV, our cluster was messed up when we started. >