Hello. I have some questions for the hbase-users (and developers) team. I'll post them in different threads since it concerns different subjects
I read the jira issue tracker content, and I have a question about case http://issues.apache.org/jira/browse/HBASE-728 HBASE-728 . I was wondering in which case dataloss is possible. And what is the impact on rows content. I run a little cluster (see bellow for configuration details), and I discover that a single column family was lost over about 50 rows; which could correspond to a MapFile (?). Could be linked to HBASE-728 ?. Note that remaining data of the rows was present In order to avoid such case, I'm asking what I've done which yield to such a failure. I have one idea, maybe someone can tell me if these hypothesis are possible or not: a) Kill regionserver during shutdown thread - Once I stop the hbase cluster, I had to wait about 5 min the stop-hbase script returns. After that, one of my regionservers was still running. Looking at top, it was working with 99% of CPU usage, I wait for a while (about 15 minutes) and I eventually decided to kill the process. I noticed the following in the log: Last lines before I kill (SIGINT) the process: --- region server log --- 2008-10-14 13:07:13,606 INFO org.mortbay.util.Container: Stopped [EMAIL PROTECTED] 2008-10-14 13:07:13,607 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting 2008-10-14 13:07:13,608 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closing leases 2008-10-14 13:07:13,609 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closed leases It seems to be normal, except that the Shutdown thread was not launched. When I sent the INT signal, the following line was logged --- region server log --- 2008-10-14 13:23:03,948 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread. But (big mistake) I did not noticed it, and since the process was still running, I sent the KILL signal. The shutdown thread had no time to end. Was it a deadlock ( http://issues.apache.org/jira/browse/HBASE-500 HBASE-500 ?) in which case does a deadlock in java use all the CPU ? ( kind of while { trylock } in the code ) b) hdfs errors I often noticed such messages, I guessed that was usual, and cannot yield to dataloss. --- region server log --- 2008-10-14 12:03:52,267 WARN org.apache.hadoop.dfs.DFSClient: Exception while reading from blk_-9054609689772898417_200511 of /hbase/table-0.3/1 790941809/bytes/mapfiles/7306020330727690009/data from 192.168.1.15:50010: java.io.IOException: Premeture EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) Thanks for your work and your advises. -- Jean-Adrien Cluster setup: 4 regionsservers / datanodes 1 is master / namenode as well. java-6-sun Total size of hdfs: 81.98 GB (replication factor 3) fsck -> healthy hadoop: 0.18.1 hbase: 0.18.0 (jar of hadoop replaced with 0.18.1) 1Gb ram per node -- View this message in context: http://www.nabble.com/Dataloss-HBASE-728-tp20013732p20013732.html Sent from the HBase User mailing list archive at Nabble.com.
