HDFS doesnt allow you to read partially written files, it reports the
size as 0 until the file is properly closed, under a crash scenario
you are in trouble.

The best options right now are to:
- dont let hbase crash (not as crazy as this sounds)
- consider experimenting with some newer hdfs stuff
- wait for hadoop 0.21

in the mean time, you will suffer loss if hbase regionservers crash.
That is a crash as in hard crash, controlled shutdowns flush and you
dont lose data then.

sorry for the confusion!
-ryan

On Thu, Aug 13, 2009 at 10:56 PM, Chen Xinli<[email protected]> wrote:
> For the Hlog, I find an interesting problem. I set the
> optionallogflushinterval to 10000, that's 10 seconds; but it flushes with
> the interval of 1 hour.
>
> After the hlog file generated, I stop hdfs and then kill hmaster and
> regionservers; then I start all again, the hmaster doesn't restore records
> from hlog, that's the record lost again. Is there something wrong?
>
>
> 2009/8/14 Chen Xinli <[email protected]>
>
>> Thanks Daniel. As you said the latest version has done much to avoid data
>> loss, would you pls give some example?
>>
>> I read the conf file and api, and find some functions related:
>> 1. in hbase-default.xml, "hbase.regionserver.optionallogflushinterval"
>> described as "Sync the HLog to the HDFS after this interval if it has not
>> accumulated enough entries to trigger a sync". I issued one update to my
>> table, but there's no hlog files after the specifed interval.
>> This setting doesn't work, or I make a misunderstanding?
>>
>> 2. HbaseAdmin.flush(tableOrRegionName). It seems that this function flush
>> the memcache to  HStorefile. Should I call this function to avoid data loss
>> after several thousand updation?
>>
>> 3. In Htable, there is also a function flushCommits. Where does it flush
>> to? memcache or hdfs?
>>
>> Actually we have a crawler, and want to store webpages(about 1 billion) in
>> hbase. What shall we do to avoid data loss? Any suggestion is appreciated.
>>
>> By the way, we use hadoop 0.19.1 + hbase 0.19.3
>> Thanks
>>
>> 2009/8/6 Jean-Daniel Cryans <[email protected]>
>>
>> Chen,
>>>
>>> The main problem is that appends are not supported in HDFS, HBase
>>> simply cannot sync its logs to it. But, we did some work to make that
>>> story better. The latest revision in the 0.19 branch and 0.20 RC1 both
>>> solve much of the data loss problem but it won't be near perfect until
>>> we have appends (supposed to be available in 0.21).
>>>
>>> J-D
>>>
>>> On Thu, Aug 6, 2009 at 12:45 AM, Chen Xinli<[email protected]> wrote:
>>> > Hi,
>>> >
>>> > I'm using hbase 0.19.3 on a cluster with 30 machines to store web data.
>>> > We got a poweroff days before and I found much web data lost. I have
>>> > searched google, and find it's a meta flush problem.
>>> >
>>> > I know there is much performance improvement in 0.20.0; Is the data lost
>>> > problem handled in the new version?
>>> >
>>> > --
>>> > Best Regards,
>>> > Chen Xinli
>>> >
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Chen Xinli
>>
>
>
>
> --
> Best Regards,
> Chen Xinli
>

Reply via email to