If you're building a cluster from scratch, why not put a medium number
of disk on all nodes, rather than some with more and some with less?
That's the optimal configuration for Hadoop, since it best distributes
data among computing nodes.
Doug
Jason Venner wrote:
We are starting to build larger clusters, and want to better understand
how to configure the network topology.
Up to now we have just been setting up a private vlan for the small
clusters.
We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that also serve
DFS
For every 4-8 of the above, one compute node with a large number of
spindles with a large number of disks, to bulk out th DFS capacity.
We are wondering what the best practices are for network topology in
clusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.