Yes, you can definitely do that.
We have tables that we put constraints on in that way. Flushing the
table ensures all data is written to HDFS and then you will not have any
data loss under HBase fault scenarios.
Chen Xinli wrote:
Thanks for your suggestion.
As our insertion is daily, that's to insert lots of records at fixed time,
can we just call HBaseAdmin.flush to avoid loss?
I have done some experiments and find it works. I wonder if it will cause
some other problem?
2009/8/14 Ryan Rawson <[email protected]>
HDFS doesnt allow you to read partially written files, it reports the
size as 0 until the file is properly closed, under a crash scenario
you are in trouble.
The best options right now are to:
- dont let hbase crash (not as crazy as this sounds)
- consider experimenting with some newer hdfs stuff
- wait for hadoop 0.21
in the mean time, you will suffer loss if hbase regionservers crash.
That is a crash as in hard crash, controlled shutdowns flush and you
dont lose data then.
sorry for the confusion!
-ryan
On Thu, Aug 13, 2009 at 10:56 PM, Chen Xinli<[email protected]> wrote:
For the Hlog, I find an interesting problem. I set the
optionallogflushinterval to 10000, that's 10 seconds; but it flushes with
the interval of 1 hour.
After the hlog file generated, I stop hdfs and then kill hmaster and
regionservers; then I start all again, the hmaster doesn't restore
records
from hlog, that's the record lost again. Is there something wrong?
2009/8/14 Chen Xinli <[email protected]>
Thanks Daniel. As you said the latest version has done much to avoid
data
loss, would you pls give some example?
I read the conf file and api, and find some functions related:
1. in hbase-default.xml, "hbase.regionserver.optionallogflushinterval"
described as "Sync the HLog to the HDFS after this interval if it has
not
accumulated enough entries to trigger a sync". I issued one update to my
table, but there's no hlog files after the specifed interval.
This setting doesn't work, or I make a misunderstanding?
2. HbaseAdmin.flush(tableOrRegionName). It seems that this function
flush
the memcache to HStorefile. Should I call this function to avoid data
loss
after several thousand updation?
3. In Htable, there is also a function flushCommits. Where does it flush
to? memcache or hdfs?
Actually we have a crawler, and want to store webpages(about 1 billion)
in
hbase. What shall we do to avoid data loss? Any suggestion is
appreciated.
By the way, we use hadoop 0.19.1 + hbase 0.19.3
Thanks
2009/8/6 Jean-Daniel Cryans <[email protected]>
Chen,
The main problem is that appends are not supported in HDFS, HBase
simply cannot sync its logs to it. But, we did some work to make that
story better. The latest revision in the 0.19 branch and 0.20 RC1 both
solve much of the data loss problem but it won't be near perfect until
we have appends (supposed to be available in 0.21).
J-D
On Thu, Aug 6, 2009 at 12:45 AM, Chen Xinli<[email protected]>
wrote:
Hi,
I'm using hbase 0.19.3 on a cluster with 30 machines to store web
data.
We got a poweroff days before and I found much web data lost. I have
searched google, and find it's a meta flush problem.
I know there is much performance improvement in 0.20.0; Is the data
lost
problem handled in the new version?
--
Best Regards,
Chen Xinli
--
Best Regards,
Chen Xinli
--
Best Regards,
Chen Xinli