Agreed. Please make an issue. Meantime, it should be possible to have a cron run a script that checks cluster resources from time-to-time -- e.g. how full hdfs is, how much each regionserver is carrying -- and when it determines the needle is in the red, flip the cluster to be read-only.
St.Ack On Mon, Nov 9, 2009 at 9:25 AM, elsif <[email protected]> wrote: > > The larger issue here is that any hbase cluster will reach this tipping > point at some point in its lifetime as more and more data is added. We > need to have a graceful method to put the cluster into safe mode until > more resources can be added or the load on the cluster has been > reduced. We cannot allow hbase to run itself into the ground causing > data loss or corruption under any circumstances. > * > * > Andrew Purtell wrote: > > You should consider provisioning more nodes to get beyond this ceiling > you encountered. > > > > DFS write latency spikes from 3 seconds to 6 seconds, to 15! Flushing > cannot happen fast enough to avoid an OOME. Possibly there was even > insufficient CPU to GC. The log entries you highlighted indicate the load > you are exerting on your current cluster needs to be spread out over more > resources than currently allocated. > > > > This: > > > >> 2009-11-06 09:15:37,144 WARN org.apache.hadoop.hbase.util.Sleeper: We > slept 286007ms, ten times longer than scheduled: 10000 > >> > > > > indicates a thread that wanted to sleep for 10 seconds was starved for > CPU for 286 seconds. Obviously Zookeeper timeouts and resulting HBase > process shutdowns, missed DFS heartbeats possibly resulting in spurious > declaration of dead datanodes, and other serious problems will result from > this. > > > > Did your systems start to swap? > > > > When region servers shut down, the master notices this and splits their > HLogs into per region reconstruction logs. These are the "oldlogfile.log" > files. The master log will shed light on why this particular reconstruction > log was botched. Would have happened at the master. The region server > probably did do a clean shutdown. I suspect DFS was in extremis due to > overloading so the split failed. The checksum error indicates incomplete > write at the OS level. Did a datanode crash? > > > > HBASE-1956 is about making the DFS latency metric exportable via the > > Hadoop metrics layer, perhaps via Ganglia. Write latency above 1 or 2 > > seconds is a warning. Anything above 5 seconds is an alarm. It's a > > good indication that an overloading condition is in progress. > > > > The Hadoop stack, being pre 1.0, has some rough edges. Response to > overloading is one of them. For one thing, HBase could be better about > applying backpressure to writing clients when the system is under stress. We > will get there. HBASE-1956 is a start. > > > > - Andy > > > > > >
