Dataloss HBASE-728

Jean-Adrien Thu, 16 Oct 2008 06:26:40 -0700

Hello.
I have some questions for the hbase-users (and developers) team. I'll post
them in different threads since it concerns different subjects


I read the jira issue tracker content, and I have a question about case 
http://issues.apache.org/jira/browse/HBASE-728 HBASE-728 .

I was wondering in which case dataloss is possible. And what is the impact
on rows content.
I run a little cluster (see bellow for configuration details), and I
discover that a single column family was lost over about 50 rows; which
could correspond to a MapFile (?). Could be linked to HBASE-728 ?. Note that
remaining data of the rows was present

In order to avoid such case, I'm asking what I've done which yield to such a
failure. I have one idea, maybe someone can tell me if these hypothesis are
possible or not:

a) Kill regionserver during shutdown thread

- Once I stop the hbase cluster, I had to wait about 5 min the stop-hbase
script returns. After that, one of my regionservers was still running.
Looking at top, it was working with 99% of CPU usage, I wait for a while
(about 15 minutes) and I eventually decided to kill the process.

I noticed the following in the log:
Last lines before I kill (SIGINT) the process:


--- region server log ---
2008-10-14 13:07:13,606 INFO org.mortbay.util.Container: Stopped
[EMAIL PROTECTED]
2008-10-14 13:07:13,607 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting
2008-10-14 13:07:13,608 INFO org.apache.hadoop.hbase.Leases:
regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closing leases
2008-10-14 13:07:13,609 INFO org.apache.hadoop.hbase.Leases:
regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closed leases


It seems to be normal, except that the Shutdown thread was not launched. 
When I sent the INT signal, the following line was logged


--- region server log ---
2008-10-14 13:23:03,948 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.


But (big mistake) I did not noticed it, and since the process was still
running, I sent the KILL signal. The shutdown thread had no time to end.

Was it a deadlock ( http://issues.apache.org/jira/browse/HBASE-500 HBASE-500 
?) in which case does a deadlock in java use all the CPU ? ( kind of while {
trylock } in the code )

b) hdfs errors

I often noticed such messages, I guessed that was usual, and cannot yield to
dataloss. 


--- region server log ---
2008-10-14 12:03:52,267 WARN org.apache.hadoop.dfs.DFSClient: Exception
while reading from blk_-9054609689772898417_200511 of /hbase/table-0.3/1
790941809/bytes/mapfiles/7306020330727690009/data from 192.168.1.15:50010:
java.io.IOException: Premeture EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)



Thanks for your work and your advises.

-- Jean-Adrien

Cluster setup:
4 regionsservers / datanodes
1 is master / namenode as well.
java-6-sun
Total size of hdfs: 81.98 GB (replication factor 3)
fsck -> healthy
hadoop: 0.18.1
hbase: 0.18.0 (jar of hadoop replaced with 0.18.1)
1Gb ram per node

-- 
View this message in context: 
http://www.nabble.com/Dataloss-HBASE-728-tp20013732p20013732.html
Sent from the HBase User mailing list archive at Nabble.com.

Dataloss HBASE-728

Reply via email to