Re: Question for HBase users

Lars George Sun, 06 Jan 2008 20:26:54 -0800

Hi Taeho,

Fortunately for us, we don't have a need for storing millions of files in
HDFS just yet. We are adding only a few thousand files a day, so that gives
us a handful of days. And we've been using Hadoop more than a year, and its
reliability has been superb.


Sounds great.

This is just a rough estimation, but we see that 1GB of RAM is required in
namenode for every 1 million files. Newer versions of Hadoop have more
optimized namenode, hence it could host more files. But to be conservative,
we see 6-7 million files is the limit for a 8GB namenode machine.

Ah, that would explain why my first attempt failed, I have a namenodewith 1GB of RAM running. That worked OK up to about 3m files, then itdied - completely. I am using now a nightly build of Hadoop/Hbase, doesthat mean I am in better shape now? How much better does it perform?

I don't think adding the "consolidation" feature into Hadoop is a goodidea.
As I said, you may have to add an "layer" that does the consolidation work,
and use that layer only when necessary.

Yes of course, that is what I meant, we have to handle the creation ofthe slaps on our end. But that is where I think we have to reinvent thewheel so to speak.

As far as the performance is concerned, I don't think it's much of an issue.
The only cost I can think of is the time taken to make a query to a DB plus
some time to find the desired file from a given "slap."

OK, my concern is more the size of each slap. Doing some quick math(correct me if I am wrong), 80TB total storage divided by say a max of1m slaps means 83MB per slap. That is quite a chunk to load. Unless Ican do a positioned load of the chunk out of a slap. Does Hadoop have aseek load feature?

Also, you may also create a slap in a way no one file can overlap more than
one slap.

Yes, that makes sense. I could think of for example simply add filestogether, like an mbox. Or use a ZIP archive. First I would cache enoughfiles in a scratch directory in Hadoop and then archive them as oneslap. (Again that sounds similar to what Hbase is doing?)

Updates... woo.. here we go again. Hadoop is not designed to handle this
need. Basically, its HDFS is designed for large files that rarely change -

Yes, understood. I could think of replacing whole slaps, or delete slapsonce all contained files are obsolete.

Let us know how your situation goes.


Will do.

Lars

Re: Question for HBase users

Reply via email to