On Thu, Oct 6, 2011 at 11:21 AM, Daniel Sikar <[email protected]> wrote:
> > If you buy the argument that EBS is resilient storage > > Just for the record, data has been lost in EBS. > > Right. That's why I qualified the statement with 'if you buy the argument...'. >From Amazon's website: 'The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives.' For Hadoop a good strategy may be to use ephemeral storage for MR temp space and EBS for HDFS data. If the data was poured into HDFS using some ETL processing, and if the origin data is still in S3, that's all the resiliency you need. Of course, it is unfortunate that openstack and other home brew clouds do not have an EBS equivalent technology. Just about now, a HDFS friendly EBS equivalent storage technology for openstack sounds like a good idea. Finally, note that I had not mentioned the cost of accessing EBS volumes. It costs ten cents for every million I/O requests. How the heck do you project that cost??? Jagane
