Re: On storing HBase data in AWS S3

Tatsuya Kawano Wed, 14 Oct 2009 00:32:32 -0700

HI Keith,

On Wed, Oct 14, 2009 at 11:58 AM, Keith Thomas <keith.tho...@gmail.com> wrote:
> Am I correct in understanding that a farm of EC2 instances with Hadoop and
> HBase installed and configured individually by myself are the quickest and
> most effective way to progress with this effort?


Well, you're not wrong. To run HBase on Amazon Web Services, you
should use EC2 instances and configure them by yourself. Make sure you
pick Extra Large instances from EC2 (see:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A8), and you may
also want EBS volumes as the storage devices rather than S3. (S3 is
good for archiving data)


But...

Are you really sure you want to use HBase for your Grail based web
application on the cloud? I would definitely recommend MySQL which
should be more suitable for both web applications and Amazon Web
Services environment. HBase is not a cloud database and is currently
more suitable for batch processing with billions of records.

If you use HBase for this purpose, you will

-- loose the Object Relational Mapping support from Grails.
-- have to take care of database transactions and secondary indices by yourself.
-- likely suffered from a latency of data retrieval, unless you use memcached.
-- need more server resources than MySQL. MySQL can run on 1 EC2
instance, while HBase requires about 12 EC2 instances (2 for masters
and DFS namenodes, 5 for region servers and DFS datanodes, 5 for
ZooKeeper)


Is there any special reason to use HBase for you web application?

Thanks,

-- 
Tatsuya Kawano (Mr.)
Tokyo, Japan

Re: On storing HBase data in AWS S3

Reply via email to