Jeremy,

That really depends on how accurate you want your tests to be vs. how long you 
want to spend running them.  Often automated OS benchmarks will reinstall the 
OS before running tests to try to ensure it is in a known state.  File system 
bench marks usually will unmount the file system and then remount it to ensure 
that caches are empty.  If you want to be very accurate then reformat HDFS and 
reconfigure everything. However even in ideal situations it is difficult to get 
consistent performance numbers out of a multinode cluster.  I would suggest you 
bring up the cluster with the maximum number of nodes you want to test with, 
then shut down some of the data nodes and task trackers on the machines you 
don't want (Just like if the box died).  Then wait for the data on them to 
finish being replicated.  It should be fairly close to what you would expect .

You can also do it programmatically.  You can black list machines using the 
admin interface on the task tracker and name node, but I have not done it 
before.

--Bobby Evans


On 8/29/11 12:07 AM, "Jeremy Villalobos" <jeremyvillalo...@gmail.com> wrote:

Hello:

The following questions are from an system administrator point of view.

How do I run scale tests using different numbers of nodes ?  Do I have to 
shutdown and restart hadoop to do this ?
What about dfs, do I have to reformat when changing the number of nodes down ?

Is there a "machines file" as done in MPI where I can specify just the number 
of nodes to be used for a test ?

Thanks.

Reply via email to