Re: scaling experiments on a static cluster?

Ted Dunning Wed, 12 Mar 2008 15:36:43 -0700

What about just taking down half of the nodes and then loading your data
into the remainder?  Should take about 20 minutes each time you remove nodes
but only a few seconds each time you add some.  Remember that you need to
reload the data each time (or rebalance it if growing the cluster) to get
realistic numbers.


My suggested procedure would be to take all but 2 nodes down, and then

- run test
- double number of nodes
- rebalance file storage
- lather, rinse, repeat


On 3/12/08 3:28 PM, "Chris Dyer" <[EMAIL PROTECTED]> wrote:

> Hi Hadoop mavens-
> I'm hoping someone out there will have a quick solution for me.  I'm
> trying to run some very basic scaling experiments for a rapidly
> approaching paper deadline on a 16.0 Hadoop cluster that has ~20 nodes
> with 2 procs/node.  Ideally, I would want to run my code on clusters
> of different numbers of nodes (1, 2, 4, 8, 16) or some such thing.
> The problem is that I am not able to reconfigure the cluster (in the
> long run, i.e., before a final version of the paper, I assume this
> will be possible, but for now it's not).  Setting the number of
> mappers/reducers does not seem to be a viable option, at least not in
> the trivial way, since the physical layout of the input files makes
> hadoop run different tasks of processes than I may request (most of my
> jobs consist of multiple MR steps, the initial one always running on a
> relatively small data set, which fits into a single block, and
> therefore the Hadoop framework does honor my task number request on
> the first job-- but during the later ones it does not).
> 
> My questions:
> 1) can I get around this limitation programmatically?  I.e., is there
> a way to tell the framework to only use a subset of the nodes for DFS
> / mapping / reducing?
> 2) if not, what statistics would be good to report if I can only have
> two data points -- a legacy "single-core" implementation of the
> algorithms and a MapReduce version running on a cluster full cluster?
> 
> Thanks for any suggestions!
> Chris

Re: scaling experiments on a static cluster?

Reply via email to