On 09/12/10 18:57, Aaron Eng wrote:
Pros:
- Easier to build out and tear down clusters vs. using physical machines in
a lab
- Easier to scale up and scale down a cluster as needed

Cons:
- Reliability.  In my experience I've had machines die, had machines fail to
start up, had network outages between Amazon instances, etc.  These problems
have occurred at a far more significant rate than any physical lab I have
ever administered.
- Money. You get charged for problems with their system.  Need to add
storage space to a node?  That means renting space from EBS which you then
need to actually spend time formatting to ext3 so you can use it with
Hadoop.  So every time you want to use storage, you're paying Amazon to
format it because you can't tell EBS that you want an ext3 volume.
- Visibility.  Amazon loves to report that all their services are working
properly on their website, meanwhile, the reality is that they only report
issues if they are extremely major.  Just yesterday they reported "increased
latency" on their us-east-1 region.  In reality, "increased latency" means
50% of my Amazon API calls were timing out, I could not create new
instances and for about 2 hours I could not destroy the instances I had
already spun up.  Hows that for ya?  Paying them for machines that they
won't let me terminate...


that's the harsh reality of all VMs. you need to monitor and stamp on things that misbehave. The nice thing is: it's easy to do this, just get HTTP status pages and kill any VM

This is not a fault of EC2: any VM infra has this feature. You can't control where your VMs come up, you are penalised by other cpu-heavy machines on the same server, amazon throttle the smaller machines a bit.

But you
 -don't pay for cluster time you don't need
-don't pay for ingress/egress for data you generate in the vendor's infrastructure (just storage)
 -can be very agile with cluster size.

I have a talk on this topic for the curious, discussing a UI that is a bit more agile, but even there we deploy agents to every node to keep an eye on the state of the cluster.

http://www.slideshare.net/steve_l/farming-hadoop-inthecloud
http://blip.tv/file/3809976

Hadoop is designed to work well in a large-scale static cluster: fixed machines, with the reactions to client to server failure failure: spin and those of servers -blacklist clients- being the right ones to leave ops in control. In a virtual world you want the clients to see (somehow) if the master nodes have moved, you want the servers to kill the misbehaving VMs to save money, and then create new ones.

-Steve

Reply via email to