Because of acquiring servers of different capacities at different times,
we have 2 servers with 1TB of disk each, and 11 servers with ~300GB
each. The 1TB servers tend to be under-utilized by HDFS given their
capacity. This makes sense, as block replicas need to be relatively
evenly distributed across the cluster in order to allow tasks to be run
close to data. For out next cluster, we're going with uniform disk,
CPU, and memory configurations.
The big question for me is how well a dual-CPU 4-core (8 cores per box)
configuration will do. Has anyone tried out this configuration with
Intel or AMD CPUs? Is the memory throughput sufficient?
Jason Venner wrote:
We are starting to build larger clusters, and want to better
understand how to configure the network topology.
Up to now we have just been setting up a private vlan for the small
clusters.
We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that also
serve DFS
For every 4-8 of the above, one compute node with a large number of
spindles with a large number of disks, to bulk out th DFS capacity.
We are wondering what the best practices are for network topology in
clusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.