g00dn3ss wrote:
So the thing that doesn't make sense to me in your above statement is that I
stop the entire HBase instance including the master.  Then I restart the
system.  Some of the regionservers get "FileNotFound" exceptions when
looking for some of the corrupted files.  Then the affected regionservers
shut down.   So I don't understand how the problem I'm seeing could be
caused by having something in memory that doesn't match what's on disk if I
am starting the entire system from scratch.

Agreed.  Something else is going on.  Can we see logs?

The other issue that causes further problems in this case is when one of
these problematic regions is on the same regionserver as the -ROOT- region.
When the regionserver holding the -ROOT- region crashes, the entire system
seems to go down.  Is this what
http://issues.apache.org/jira/browse/HBASE-1080 is about?

No. 1080 is about an odd deadlock in master (May have been resolved by 543).

System should recover when region hosting -ROOT- goes down. Which version of hbase (pardon me if you've already said which version)?

St.Ack

Reply via email to