Hey Stuart

I did that for a client using Cascading events and SQS.

When jobs completed, they dropped a message on SQS where a listener picked up new jobs and ran with them, or decided to kill off the cluster. The currently shipping EC2 scripts are suitable for having multiple simultaneous clusters for this purpose.

Cascading has always and now Hadoop supports (thanks Tom) raw file access on S3, so this is quite natural. This is the best approach as data is pulled directly into the Mapper, instead of onto HDFS first, then read into the Mapper from HDFS.

YMMV

chris

On Oct 23, 2008, at 7:47 AM, Stuart Sierra wrote:

Hi folks,
Anybody tried scripting Hadoop on EC2 to...
1. Launch a cluster
2. Pull data from S3
3. Run a job
4. Copy results to S3
5. Terminate the cluster
... without any user interaction?

-Stuart

--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Reply via email to