Is this needed with 0.20 too? I am skipping the WALs during imports so that makes it even less fault tolerant...
On 8/14/09, stack <[email protected]> wrote: > Or just add below to cron: > > echo "flush TABLENAME" |./bin/hbase shell > > Or adjust the configuration in hbase so it flushes once a day (see > hbase-default.xml for all options). > > St.Ack > > On Fri, Aug 14, 2009 at 2:13 AM, Chen Xinli <[email protected]> wrote: > >> Thanks for your suggestion. >> As our insertion is daily, that's to insert lots of records at fixed time, >> can we just call HBaseAdmin.flush to avoid loss? >> I have done some experiments and find it works. I wonder if it will cause >> some other problem? >> >> >> 2009/8/14 Ryan Rawson <[email protected]> >> >> > HDFS doesnt allow you to read partially written files, it reports the >> > size as 0 until the file is properly closed, under a crash scenario >> > you are in trouble. >> > >> > The best options right now are to: >> > - dont let hbase crash (not as crazy as this sounds) >> > - consider experimenting with some newer hdfs stuff >> > - wait for hadoop 0.21 >> > >> > in the mean time, you will suffer loss if hbase regionservers crash. >> > That is a crash as in hard crash, controlled shutdowns flush and you >> > dont lose data then. >> > >> > sorry for the confusion! >> > -ryan >> > >> > On Thu, Aug 13, 2009 at 10:56 PM, Chen Xinli<[email protected]> wrote: >> > > For the Hlog, I find an interesting problem. I set the >> > > optionallogflushinterval to 10000, that's 10 seconds; but it flushes >> with >> > > the interval of 1 hour. >> > > >> > > After the hlog file generated, I stop hdfs and then kill hmaster and >> > > regionservers; then I start all again, the hmaster doesn't restore >> > records >> > > from hlog, that's the record lost again. Is there something wrong? >> > > >> > > >> > > 2009/8/14 Chen Xinli <[email protected]> >> > > >> > >> Thanks Daniel. As you said the latest version has done much to avoid >> > data >> > >> loss, would you pls give some example? >> > >> >> > >> I read the conf file and api, and find some functions related: >> > >> 1. in hbase-default.xml, >> > >> "hbase.regionserver.optionallogflushinterval" >> > >> described as "Sync the HLog to the HDFS after this interval if it has >> > not >> > >> accumulated enough entries to trigger a sync". I issued one update to >> my >> > >> table, but there's no hlog files after the specifed interval. >> > >> This setting doesn't work, or I make a misunderstanding? >> > >> >> > >> 2. HbaseAdmin.flush(tableOrRegionName). It seems that this function >> > flush >> > >> the memcache to HStorefile. Should I call this function to avoid >> > >> data >> > loss >> > >> after several thousand updation? >> > >> >> > >> 3. In Htable, there is also a function flushCommits. Where does it >> flush >> > >> to? memcache or hdfs? >> > >> >> > >> Actually we have a crawler, and want to store webpages(about 1 >> billion) >> > in >> > >> hbase. What shall we do to avoid data loss? Any suggestion is >> > appreciated. >> > >> >> > >> By the way, we use hadoop 0.19.1 + hbase 0.19.3 >> > >> Thanks >> > >> >> > >> 2009/8/6 Jean-Daniel Cryans <[email protected]> >> > >> >> > >> Chen, >> > >>> >> > >>> The main problem is that appends are not supported in HDFS, HBase >> > >>> simply cannot sync its logs to it. But, we did some work to make >> > >>> that >> > >>> story better. The latest revision in the 0.19 branch and 0.20 RC1 >> both >> > >>> solve much of the data loss problem but it won't be near perfect >> until >> > >>> we have appends (supposed to be available in 0.21). >> > >>> >> > >>> J-D >> > >>> >> > >>> On Thu, Aug 6, 2009 at 12:45 AM, Chen Xinli<[email protected]> >> > wrote: >> > >>> > Hi, >> > >>> > >> > >>> > I'm using hbase 0.19.3 on a cluster with 30 machines to store web >> > data. >> > >>> > We got a poweroff days before and I found much web data lost. I >> have >> > >>> > searched google, and find it's a meta flush problem. >> > >>> > >> > >>> > I know there is much performance improvement in 0.20.0; Is the >> > >>> > data >> > lost >> > >>> > problem handled in the new version? >> > >>> > >> > >>> > -- >> > >>> > Best Regards, >> > >>> > Chen Xinli >> > >>> > >> > >>> >> > >> >> > >> >> > >> >> > >> -- >> > >> Best Regards, >> > >> Chen Xinli >> > >> >> > > >> > > >> > > >> > > -- >> > > Best Regards, >> > > Chen Xinli >> > > >> > >> >> >> >> -- >> Best Regards, >> Chen Xinli >> > -- Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
