Re: data loss with hbase 0.19.3

Chen Xinli Fri, 14 Aug 2009 02:14:13 -0700

Thanks for your suggestion.
As our insertion is daily, that's to insert lots of records at fixed time,
can we just call HBaseAdmin.flush to avoid loss?
I have done some experiments and find it works. I wonder if it will cause
some other problem?



2009/8/14 Ryan Rawson <[email protected]>

> HDFS doesnt allow you to read partially written files, it reports the
> size as 0 until the file is properly closed, under a crash scenario
> you are in trouble.
>
> The best options right now are to:
> - dont let hbase crash (not as crazy as this sounds)
> - consider experimenting with some newer hdfs stuff
> - wait for hadoop 0.21
>
> in the mean time, you will suffer loss if hbase regionservers crash.
> That is a crash as in hard crash, controlled shutdowns flush and you
> dont lose data then.
>
> sorry for the confusion!
> -ryan
>
> On Thu, Aug 13, 2009 at 10:56 PM, Chen Xinli<[email protected]> wrote:
> > For the Hlog, I find an interesting problem. I set the
> > optionallogflushinterval to 10000, that's 10 seconds; but it flushes with
> > the interval of 1 hour.
> >
> > After the hlog file generated, I stop hdfs and then kill hmaster and
> > regionservers; then I start all again, the hmaster doesn't restore
> records
> > from hlog, that's the record lost again. Is there something wrong?
> >
> >
> > 2009/8/14 Chen Xinli <[email protected]>
> >
> >> Thanks Daniel. As you said the latest version has done much to avoid
> data
> >> loss, would you pls give some example?
> >>
> >> I read the conf file and api, and find some functions related:
> >> 1. in hbase-default.xml, "hbase.regionserver.optionallogflushinterval"
> >> described as "Sync the HLog to the HDFS after this interval if it has
> not
> >> accumulated enough entries to trigger a sync". I issued one update to my
> >> table, but there's no hlog files after the specifed interval.
> >> This setting doesn't work, or I make a misunderstanding?
> >>
> >> 2. HbaseAdmin.flush(tableOrRegionName). It seems that this function
> flush
> >> the memcache to  HStorefile. Should I call this function to avoid data
> loss
> >> after several thousand updation?
> >>
> >> 3. In Htable, there is also a function flushCommits. Where does it flush
> >> to? memcache or hdfs?
> >>
> >> Actually we have a crawler, and want to store webpages(about 1 billion)
> in
> >> hbase. What shall we do to avoid data loss? Any suggestion is
> appreciated.
> >>
> >> By the way, we use hadoop 0.19.1 + hbase 0.19.3
> >> Thanks
> >>
> >> 2009/8/6 Jean-Daniel Cryans <[email protected]>
> >>
> >> Chen,
> >>>
> >>> The main problem is that appends are not supported in HDFS, HBase
> >>> simply cannot sync its logs to it. But, we did some work to make that
> >>> story better. The latest revision in the 0.19 branch and 0.20 RC1 both
> >>> solve much of the data loss problem but it won't be near perfect until
> >>> we have appends (supposed to be available in 0.21).
> >>>
> >>> J-D
> >>>
> >>> On Thu, Aug 6, 2009 at 12:45 AM, Chen Xinli<[email protected]>
> wrote:
> >>> > Hi,
> >>> >
> >>> > I'm using hbase 0.19.3 on a cluster with 30 machines to store web
> data.
> >>> > We got a poweroff days before and I found much web data lost. I have
> >>> > searched google, and find it's a meta flush problem.
> >>> >
> >>> > I know there is much performance improvement in 0.20.0; Is the data
> lost
> >>> > problem handled in the new version?
> >>> >
> >>> > --
> >>> > Best Regards,
> >>> > Chen Xinli
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >> Chen Xinli
> >>
> >
> >
> >
> > --
> > Best Regards,
> > Chen Xinli
> >
>



-- 
Best Regards,
Chen Xinli

Re: data loss with hbase 0.19.3

Reply via email to