HDFS doesnt allow you to read partially written files, it reports the size as 0 until the file is properly closed, under a crash scenario you are in trouble.
The best options right now are to: - dont let hbase crash (not as crazy as this sounds) - consider experimenting with some newer hdfs stuff - wait for hadoop 0.21 in the mean time, you will suffer loss if hbase regionservers crash. That is a crash as in hard crash, controlled shutdowns flush and you dont lose data then. sorry for the confusion! -ryan On Thu, Aug 13, 2009 at 10:56 PM, Chen Xinli<[email protected]> wrote: > For the Hlog, I find an interesting problem. I set the > optionallogflushinterval to 10000, that's 10 seconds; but it flushes with > the interval of 1 hour. > > After the hlog file generated, I stop hdfs and then kill hmaster and > regionservers; then I start all again, the hmaster doesn't restore records > from hlog, that's the record lost again. Is there something wrong? > > > 2009/8/14 Chen Xinli <[email protected]> > >> Thanks Daniel. As you said the latest version has done much to avoid data >> loss, would you pls give some example? >> >> I read the conf file and api, and find some functions related: >> 1. in hbase-default.xml, "hbase.regionserver.optionallogflushinterval" >> described as "Sync the HLog to the HDFS after this interval if it has not >> accumulated enough entries to trigger a sync". I issued one update to my >> table, but there's no hlog files after the specifed interval. >> This setting doesn't work, or I make a misunderstanding? >> >> 2. HbaseAdmin.flush(tableOrRegionName). It seems that this function flush >> the memcache to HStorefile. Should I call this function to avoid data loss >> after several thousand updation? >> >> 3. In Htable, there is also a function flushCommits. Where does it flush >> to? memcache or hdfs? >> >> Actually we have a crawler, and want to store webpages(about 1 billion) in >> hbase. What shall we do to avoid data loss? Any suggestion is appreciated. >> >> By the way, we use hadoop 0.19.1 + hbase 0.19.3 >> Thanks >> >> 2009/8/6 Jean-Daniel Cryans <[email protected]> >> >> Chen, >>> >>> The main problem is that appends are not supported in HDFS, HBase >>> simply cannot sync its logs to it. But, we did some work to make that >>> story better. The latest revision in the 0.19 branch and 0.20 RC1 both >>> solve much of the data loss problem but it won't be near perfect until >>> we have appends (supposed to be available in 0.21). >>> >>> J-D >>> >>> On Thu, Aug 6, 2009 at 12:45 AM, Chen Xinli<[email protected]> wrote: >>> > Hi, >>> > >>> > I'm using hbase 0.19.3 on a cluster with 30 machines to store web data. >>> > We got a poweroff days before and I found much web data lost. I have >>> > searched google, and find it's a meta flush problem. >>> > >>> > I know there is much performance improvement in 0.20.0; Is the data lost >>> > problem handled in the new version? >>> > >>> > -- >>> > Best Regards, >>> > Chen Xinli >>> > >>> >> >> >> >> -- >> Best Regards, >> Chen Xinli >> > > > > -- > Best Regards, > Chen Xinli >
