We have 3 types of machines we can get, 2 disk, 6 disk and 16 disk machines. They all have 4 dual core cpus.

The 2 disk machines have about 1 TB, the 6 disks about 3TB and the 16 disk about 8TB. The 16 disk machines have about 25% slower CPU's than the 2/6 disk machines.

We handle a lot of bulky data, and don't think we can fit it all o the 3TB machines if those are our sole compute/dfs nodes.

From my reading, I conjecture that an ideal configuration would be 1 local disk per cpu for local data/reducing, and some number of separate disks for dfs.
Is this an accurate assessment?


Doug Cutting wrote:
If you're building a cluster from scratch, why not put a medium number of disk on all nodes, rather than some with more and some with less? That's the optimal configuration for Hadoop, since it best distributes data among computing nodes.

Doug

Jason Venner wrote:
We are starting to build larger clusters, and want to better understand how to configure the network topology. Up to now we have just been setting up a private vlan for the small clusters.

We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that also serve DFS For every 4-8 of the above, one compute node with a large number of spindles with a large number of disks, to bulk out th DFS capacity.

We are wondering what the best practices are for network topology in clusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.

Reply via email to