Hi Keith, "I am looking for the easiest way to bring up an HBase and Hadoop environment as the persistence mechanism for a Grails based web application." I think Ryan has already cleared the doubts around using HBase for live applications. You might start looking at a new grails plugin- http://grails.org/plugin/gorm-hbase to get a head start.
-sanjay -----Original Message----- From: Ryan Rawson [mailto:ryano...@gmail.com] Sent: Wednesday, October 14, 2009 1:32 PM To: hbase-user@hadoop.apache.org Subject: Re: On storing HBase data in AWS S3 Hey! I strongly disagree with Tatsaya's assessment of HBase, specifically below: On Wed, Oct 14, 2009 at 12:31 AM, Tatsuya Kawano <tatsuy...@snowcocoa.info> wrote: > HI Keith, > > On Wed, Oct 14, 2009 at 11:58 AM, Keith Thomas <keith.tho...@gmail.com> wrote: >> Am I correct in understanding that a farm of EC2 instances with Hadoop and >> HBase installed and configured individually by myself are the quickest and >> most effective way to progress with this effort? > > Well, you're not wrong. To run HBase on Amazon Web Services, you > should use EC2 instances and configure them by yourself. Make sure you > pick Extra Large instances from EC2 (see: > http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A8), and you may > also want EBS volumes as the storage devices rather than S3. (S3 is > good for archiving data) > > > But... > > Are you really sure you want to use HBase for your Grail based web > application on the cloud? I would definitely recommend MySQL which > should be more suitable for both web applications and Amazon Web > Services environment. HBase is not a cloud database and is currently > more suitable for batch processing with billions of records. This is not a correct assessment - first off, what does it mean to be a "cloud database". And secondly, HBase is suitable for storing real time queries, and it is a major use case that we have here at stumbleupon. > > If you use HBase for this purpose, you will > > -- loose the Object Relational Mapping support from Grails. > -- have to take care of database transactions and secondary indices by > yourself. You do "lose" the transactions (if you even used them) and you may have to maintain secondary indexes, but you gain a flexible schema-less column-oriented datastore that scales far beyond anything mysql can do. > -- likely suffered from a latency of data retrieval, unless you use memcached. This is not correct - HBase has good caching built in, and takes full advantage of linux's disk buffer cache. Much more effective than MySQL because it is easier to get more ram across 10-20 machines (or more) than ram in 1-2 machines. > -- need more server resources than MySQL. MySQL can run on 1 EC2 > instance, while HBase requires about 12 EC2 instances (2 for masters > and DFS namenodes, 5 for region servers and DFS datanodes, 5 for > ZooKeeper) Again, this is not entirely correct, you are overspecing quite a bit. 3 ZK nodes is fine, and they should be able to run on the "master" nodes. And you also reveal your misunderstanding, suggesting to the OP that you can run namenode on 2 hosts and that is that. The situation for HDFS is (unfortunately) more complicated than that. It is totally possible for a HBase cluster to be run on 4 EC2 instances, 1 master, 3 datanodes. Maybe even less, but you are sacrificing data reliability. i appreciate your enthusiasm for HBase, but please don't mislead our users so badly! Thanks, -ryan > > > Is there any special reason to use HBase for you web application? > > Thanks, > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan > Follow us on Twitter- https://twitter.com/impetuscalling. *Impetus Celebrates Green Diwali. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.