Hi, I have a reasonably simple question that I thought I'd post to this list because I don't have enough experience with hardware to figure this out myself.
Let's assume that I have 2 separate cluster setups for slave nodes. The master node is a separate machine *outside* these clusters: *Setup A*: 28 nodes, each with a 2-core CPU, 8 GB RAM and 1 SATA drives (1 TB each) *Setup B*: 7 nodes, each with a 8-core CPU, 32 GB Ram and 4 SATA drives (1 TB each) Note that I have maintained the same *core:memory:spindle* ratio above. In essence, setup B has the same overall processing + memory + spindle capacity, but achieved with 4 times fewer nodes. Ignoring the* cost* of each node above, and assuming a 10Gb Ethernet connectivity and the same speed-per-core across nodes in both the scenarios above, are Setup A and Setup B equivalent to each other in the context of setting up a Hadoop cluster? Or will the relative performance be different? Excluding the network connectivity between the nodes, what would be some other criteria that might give one setup an edge over the other, for regular Hadoop jobs? Also, assuming the same type of Hadoop jobs on both clusters, how different would the load experienced by the master node be for each setup above? Thanks in advance, Safdar