Re: data loss with hbase 0.19.3

Amandeep Khurana Sat, 15 Aug 2009 01:53:42 -0700

Is this needed with 0.20 too? I am skipping the WALs during imports so
that makes it even less fault tolerant...


On 8/14/09, stack <[email protected]> wrote:
> Or just add below to cron:
>
> echo "flush TABLENAME" |./bin/hbase shell
>
> Or adjust the configuration in hbase so it flushes once a day (see
> hbase-default.xml for all options).
>
> St.Ack
>
> On Fri, Aug 14, 2009 at 2:13 AM, Chen Xinli <[email protected]> wrote:
>
>> Thanks for your suggestion.
>> As our insertion is daily, that's to insert lots of records at fixed time,
>> can we just call HBaseAdmin.flush to avoid loss?
>> I have done some experiments and find it works. I wonder if it will cause
>> some other problem?
>>
>>
>> 2009/8/14 Ryan Rawson <[email protected]>
>>
>> > HDFS doesnt allow you to read partially written files, it reports the
>> > size as 0 until the file is properly closed, under a crash scenario
>> > you are in trouble.
>> >
>> > The best options right now are to:
>> > - dont let hbase crash (not as crazy as this sounds)
>> > - consider experimenting with some newer hdfs stuff
>> > - wait for hadoop 0.21
>> >
>> > in the mean time, you will suffer loss if hbase regionservers crash.
>> > That is a crash as in hard crash, controlled shutdowns flush and you
>> > dont lose data then.
>> >
>> > sorry for the confusion!
>> > -ryan
>> >
>> > On Thu, Aug 13, 2009 at 10:56 PM, Chen Xinli<[email protected]> wrote:
>> > > For the Hlog, I find an interesting problem. I set the
>> > > optionallogflushinterval to 10000, that's 10 seconds; but it flushes
>> with
>> > > the interval of 1 hour.
>> > >
>> > > After the hlog file generated, I stop hdfs and then kill hmaster and
>> > > regionservers; then I start all again, the hmaster doesn't restore
>> > records
>> > > from hlog, that's the record lost again. Is there something wrong?
>> > >
>> > >
>> > > 2009/8/14 Chen Xinli <[email protected]>
>> > >
>> > >> Thanks Daniel. As you said the latest version has done much to avoid
>> > data
>> > >> loss, would you pls give some example?
>> > >>
>> > >> I read the conf file and api, and find some functions related:
>> > >> 1. in hbase-default.xml,
>> > >> "hbase.regionserver.optionallogflushinterval"
>> > >> described as "Sync the HLog to the HDFS after this interval if it has
>> > not
>> > >> accumulated enough entries to trigger a sync". I issued one update to
>> my
>> > >> table, but there's no hlog files after the specifed interval.
>> > >> This setting doesn't work, or I make a misunderstanding?
>> > >>
>> > >> 2. HbaseAdmin.flush(tableOrRegionName). It seems that this function
>> > flush
>> > >> the memcache to  HStorefile. Should I call this function to avoid
>> > >> data
>> > loss
>> > >> after several thousand updation?
>> > >>
>> > >> 3. In Htable, there is also a function flushCommits. Where does it
>> flush
>> > >> to? memcache or hdfs?
>> > >>
>> > >> Actually we have a crawler, and want to store webpages(about 1
>> billion)
>> > in
>> > >> hbase. What shall we do to avoid data loss? Any suggestion is
>> > appreciated.
>> > >>
>> > >> By the way, we use hadoop 0.19.1 + hbase 0.19.3
>> > >> Thanks
>> > >>
>> > >> 2009/8/6 Jean-Daniel Cryans <[email protected]>
>> > >>
>> > >> Chen,
>> > >>>
>> > >>> The main problem is that appends are not supported in HDFS, HBase
>> > >>> simply cannot sync its logs to it. But, we did some work to make
>> > >>> that
>> > >>> story better. The latest revision in the 0.19 branch and 0.20 RC1
>> both
>> > >>> solve much of the data loss problem but it won't be near perfect
>> until
>> > >>> we have appends (supposed to be available in 0.21).
>> > >>>
>> > >>> J-D
>> > >>>
>> > >>> On Thu, Aug 6, 2009 at 12:45 AM, Chen Xinli<[email protected]>
>> > wrote:
>> > >>> > Hi,
>> > >>> >
>> > >>> > I'm using hbase 0.19.3 on a cluster with 30 machines to store web
>> > data.
>> > >>> > We got a poweroff days before and I found much web data lost. I
>> have
>> > >>> > searched google, and find it's a meta flush problem.
>> > >>> >
>> > >>> > I know there is much performance improvement in 0.20.0; Is the
>> > >>> > data
>> > lost
>> > >>> > problem handled in the new version?
>> > >>> >
>> > >>> > --
>> > >>> > Best Regards,
>> > >>> > Chen Xinli
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Best Regards,
>> > >> Chen Xinli
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards,
>> > > Chen Xinli
>> > >
>> >
>>
>>
>>
>> --
>> Best Regards,
>> Chen Xinli
>>
>


-- 


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Re: data loss with hbase 0.19.3

Reply via email to