I think that the determining factor of when you should use HBase instead of HDFS files is really the consumption pattern. If you're only ever going to process the data in bulk, then chances are you'll get the most performance out of a raw HDFS file. However, if you need to have random access to some of the entries, then HBase will give you significant benefit.

There are other factors that go into this decision. One that I can think of off the top of my head is if you'd like to take advantage of the versioning and semi-defined schema of HBase for your dataset. It would be a little complicated to duplicate all of that logic on your own from a flat file.

Another factor is your system's workflow. If you use HDFS files, you need to be ok with always rewriting the files to do any "updates". So even if you only add 1MB worth of new data to a 1TB dataset, you have to rewrite the whole thing. HBase would let you "insert" it where it belongs. (Of course, HBase has the same constraints as your applications do, except we've already done the work to manage random inserts.)

Does this help you out?

-Bryan

On May 13, 2008, at 10:13 AM, Naama Kraus wrote:

Hi,

Can anyone say some words on when to use HBase as opposed to using Plain
MapReduce on input files ?
In more details, when will it make sense to put data into HBase and then use HBase methods to access it, including running MapReduce on the data in the tables. As opposed to simply putting the data into HDFS and processing it
with MapReduce.

Thanks, Naama

On Wed, Mar 12, 2008 at 12:15 AM, Bryan Duxbury <[EMAIL PROTECTED]> wrote:

I've written up a blog post discussing when I think it's appropriate to use HBase in response to some of the questions people usually ask. You can
find it at http://blog.rapleaf.com/dev/?p=26.

-Bryan




--
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Reply via email to