Dru Jensen wrote:
St.Ack.  Thanks for your response.

I will enable DEBUG and rerun the MR processes to try and reproduce this.
Hadoop is reporting everything is healthy using fsck.
This is a test platform so the data is not critical but my confidence is shaken (I sound like a day trader).
Understood.

Questions:
1. Is there anything specific I should be looking for when I enable DEBUG?

Thats a bit of a tough question. Requires study to be able to interpret. In short, look at lines before ERRORs and WARNINGs for anything that might explain why the ERROR or WARNING exception (NotServingRegionExceptions are part of 'normal' operation -- its the other exception-types you are interested in).

2. Does "bad" mean I cannot recover and i need to delete /hbase.rootdir and start over?

Not if fsck says all is ok. I said 'bad' because I thought you would have to do the above.

Any events on your cluster that might have effected HDFS? A tsunami hit? Or in your case, it wouldn't take much since replication was set to one -- did a host crash?

3. Does HBase depend on replication for normal operation? In other words, will it work without replication enabled?

It'll work fine without replication until you lose data. Thereafter, it'll be hobbled by files with holes in them -- where the holes are blocks that sat on the downed server.

Would suggest running with replication of 3 unless you have very good reason -- and insurance against failure -- for doing otherwise.

Go easy Dru,
St.Ack


On Sep 29, 2008, at 1:27 PM, stack wrote:

Dru Jensen wrote:
HBase was not responding to Thrift requests so I tried to restart but it still looks frozen. I am seeing several error messages in the hmaster logs after I attempted to restart hbase:

2008-09-29 12:55:23,744 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: error opening region {table},{key},1222453917858
java.io.IOException: Premeture EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)

Enable DEBUG and it might tell you what it was trying to open at time of the exception.


and:

2008-09-29 12:58:50,067 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: error opening region {table},{key},1222453917858
java.io.IOException: Could not obtain block: blk_-2905695662732817278

This is bad. What happens if you run './bin/hadoop fsck /hbase.rootdir'? Your replication is one. Means if any hdfs hiccup, data is lost. You might putting replication back to the default?



St.Ack



Reply via email to