> Not always being able to read back an object that has been written is deadly. 
> Having the S3 client cache written > data for a while can help but isn't a 
> complete solution because the RS can fail and its regions will be reassigned 
> > to another RS... who then might not be able to read the data. A region 
> might bounce around the cluster taking
> exceptions on open for a while. This availability problem could eventually 
> stall all clients. To address this, you
> could implement a distributed write-behind cache for S3, but is it worth the 
> effort and added complexity?

Argh. Eventual consistency bites. Perhaps HDFS on EBS is the only viable 
solution after all.

The trouble is cost - S3 is 14 cents a GB-month, with full redundancy (whatever 
that means), whereas EBS is 10 cents a GB-month. EBS' redundancy may not really 
be adequate. So you probably need 2 or 3 HDFS block replicas, so EBS storage 
may cost 20 cents a GB-month or 30 cents a GB-month, depending on your pain 
threshold.

I am most interested in running HBase well in the cloud - EC2 and other 
OpenStack based IaaSes.

Thanks for sharing your insights, Andrew.


Jagane

Reply via email to