Because of acquiring servers of different capacities at different times, we have 2 servers with 1TB of disk each, and 11 servers with ~300GB each. The 1TB servers tend to be under-utilized by HDFS given their capacity. This makes sense, as block replicas need to be relatively evenly distributed across the cluster in order to allow tasks to be run close to data. For out next cluster, we're going with uniform disk, CPU, and memory configurations. The big question for me is how well a dual-CPU 4-core (8 cores per box) configuration will do. Has anyone tried out this configuration with Intel or AMD CPUs? Is the memory throughput sufficient?

Jason Venner wrote:
We are starting to build larger clusters, and want to better understand how to configure the network topology. Up to now we have just been setting up a private vlan for the small clusters.

We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that also serve DFS For every 4-8 of the above, one compute node with a large number of spindles with a large number of disks, to bulk out th DFS capacity.

We are wondering what the best practices are for network topology in clusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.

Reply via email to