On Fri, 2009-04-03 at 11:19 +0100, Steve Loughran wrote: > True, but this way nobody gets the opportunity to learn how to do it > themselves, which can be a tactical error one comes to regret further > down the line. By learning the pain of cluster management today, you get > to keep it under control as your data grows.
Personally I don't want to have to learn (and especially not support in production) the EC2 / S3 part, so it does sound appealing. On a side note, I'd hope that at some point they give some control over the priority of the overall job - on the level of "you can boot up these machines whenever you want", or "boot up these machines now" - that should let them manage the load on their hardware and reduce costs (which I'd obviously expect them to pass on the users of low-priority jobs). I'm not sure how that would fit into the "give me 10 nodes" method at the moment. > > I am curious what bug patches AWS will supply, for they have been very > silent on their hadoop work to date. I'm hoping it will involve security of EC2 images, but not expectant.