Re: Separate data-nodes from worker-nodes

Doug Cutting Fri, 14 Mar 2008 12:12:14 -0700

Andrey Pankov wrote:

It's a little bit expensive to have big cluster running for a longperiod, especially if you use EC2. So, as possible solution, we canstart additional nodes and include them into cluster before running job,and then, after finishing, kill unused nodes.

As Ted has indicated, that should work. It won't be as fast as if youkeep the entire cluster running the whole time, but it will be much cheaper.

An alternative is to store your persistent data in S3. Then you canshut down your cluster altogether when you're not computing. Yourstartup time each day will be slower, since reading from S3 is slowerthan reading from HDFS, so this may or may not be practical for you.


Doug

Re: Separate data-nodes from worker-nodes

Reply via email to