> So unless the process is drastically simpler than I've estimated, I
> think my next stop is going to be a SimpleDB tutorial, keeping my
> hbase work handy as another alternative.
Well, SimpleDB is out - the limited beta is closed. That leaves me with just S3.
-- Jim
On Thu, May 8, 2008 at 10:19 AM, Jim R. Wilson <[EMAIL PROTECTED]> wrote:
> Unfortunately, I'm about to give up on hbase over ec2.
>
> In my application, the hbase storage is very simple, write-once text
> storage. To get this to work on ec2, I've concluded I need the
> following:
>
> 1. A cluster of hadoop machines running an appropriate version of
> hadoop (0.16.3 at the time of this writing)
>
> 2. Hbase running on the same cluster, either connected to S3, which
> I've been warned as slow, or HDFS on top of PersistentFS which may or
> may not fair better.
>
> 3. Thrift service running atop hbase for interaction from remote
> (outside ec2) Python and PHP scripts.
>
> 4. Static IP's for any hadoop nodes running data-transfer jobs due to
> firewall restrictions on the MySQL end (outside ec2), and also so that
> the Python/PHP scripts know where to find Thrift.
>
> 5. Mechanism to force all hbase nodes to write any memory-resident
> changes to disc for backup purposes (Java).
>
> Now, my particular problem is very simple - just numeric key to text
> storage. Ex: { "1":"Hello", "2":"World" }. I've (nearly) come to the
> conclusion that I would be much better off either:
>
> a. Using an S3 bucket to store 1.txt, 2.txt etc (probably with a
> heirarchical dir structure to keep the directories small - I've got
> about 4 million such number/text pairs at the moment).
>
> b. Using SimpleDB (which I've yet to learn, but expect to be similar
> to hbase/BigTable)
>
> c. Running an hbase/hadoop cluster somewhere else (I already have a
> single-node cluster working great on our hosting provider's internal
> network).
>
> So unless the process is drastically simpler than I've estimated, I
> think my next stop is going to be a SimpleDB tutorial, keeping my
> hbase work handy as another alternative.
>
> -- Jim R. Wilson (jimbojw)
>