We have 3 types of machines we can get, 2 disk, 6 disk and 16 disk
machines. They all have 4 dual core cpus.
The 2 disk machines have about 1 TB, the 6 disks about 3TB and the 16
disk about 8TB. The 16 disk machines have about 25% slower CPU's than
the 2/6 disk machines.
We handle a lot of bulky data, and don't think we can fit it all o the
3TB machines if those are our sole compute/dfs nodes.
From my reading, I conjecture that an ideal configuration would be 1
local disk per cpu for local data/reducing, and some number of separate
disks for dfs.
Is this an accurate assessment?
Doug Cutting wrote:
If you're building a cluster from scratch, why not put a medium number
of disk on all nodes, rather than some with more and some with less?
That's the optimal configuration for Hadoop, since it best distributes
data among computing nodes.
Doug
Jason Venner wrote:
We are starting to build larger clusters, and want to better
understand how to configure the network topology.
Up to now we have just been setting up a private vlan for the small
clusters.
We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that also
serve DFS
For every 4-8 of the above, one compute node with a large number of
spindles with a large number of disks, to bulk out th DFS capacity.
We are wondering what the best practices are for network topology in
clusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.