If you're building a cluster from scratch, why not put a medium number of disk on all nodes, rather than some with more and some with less? That's the optimal configuration for Hadoop, since it best distributes data among computing nodes.

Doug

Jason Venner wrote:
We are starting to build larger clusters, and want to better understand how to configure the network topology. Up to now we have just been setting up a private vlan for the small clusters.

We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that also serve DFS For every 4-8 of the above, one compute node with a large number of spindles with a large number of disks, to bulk out th DFS capacity.

We are wondering what the best practices are for network topology in clusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.

Reply via email to