Re: data loss with hbase 0.19.3

Chen Xinli Thu, 13 Aug 2009 19:03:49 -0700

Thanks Daniel. As you said the latest version has done much to avoid data
loss, would you pls give some example?

I read the conf file and api, and find some functions related:
1. in hbase-default.xml, "hbase.regionserver.optionallogflushinterval"
described as "Sync the HLog to the HDFS after this interval if it has not
accumulated enough entries to trigger a sync". I issued one update to my
table, but there's no hlog files after the specifed interval.
This setting doesn't work, or I make a misunderstanding?

2. HbaseAdmin.flush(tableOrRegionName). It seems that this function flush
the memcache to  HStorefile. Should I call this function to avoid data loss
after several thousand updation?

3. In Htable, there is also a function flushCommits. Where does it flush to?
memcache or hdfs?

Actually we have a crawler, and want to store webpages(about 1 billion) in
hbase. What shall we do to avoid data loss? Any suggestion is appreciated.

By the way, we use hadoop 0.19.1 + hbase 0.19.3
Thanks

2009/8/6 Jean-Daniel Cryans <[email protected]>

> Chen,
>
> The main problem is that appends are not supported in HDFS, HBase
> simply cannot sync its logs to it. But, we did some work to make that
> story better. The latest revision in the 0.19 branch and 0.20 RC1 both
> solve much of the data loss problem but it won't be near perfect until
> we have appends (supposed to be available in 0.21).
>
> J-D
>
> On Thu, Aug 6, 2009 at 12:45 AM, Chen Xinli<[email protected]> wrote:
> > Hi,
> >
> > I'm using hbase 0.19.3 on a cluster with 30 machines to store web data.
> > We got a poweroff days before and I found much web data lost. I have
> > searched google, and find it's a meta flush problem.
> >
> > I know there is much performance improvement in 0.20.0; Is the data lost
> > problem handled in the new version?
> >
> > --
> > Best Regards,
> > Chen Xinli
> >
>

-- 
Best Regards,
Chen Xinli

Re: data loss with hbase 0.19.3

Reply via email to