Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Jason Venner Tue, 12 Feb 2008 12:53:37 -0800

We have 3 types of machines we can get, 2 disk, 6 disk and 16 diskmachines. They all have 4 dual core cpus.

The 2 disk machines have about 1 TB, the 6 disks about 3TB and the 16disk about 8TB. The 16 disk machines have about 25% slower CPU's thanthe 2/6 disk machines.

We handle a lot of bulky data, and don't think we can fit it all o the3TB machines if those are our sole compute/dfs nodes.

From my reading, I conjecture that an ideal configuration would be 1local disk per cpu for local data/reducing, and some number of separatedisks for dfs.

Is this an accurate assessment?


Doug Cutting wrote:

If you're building a cluster from scratch, why not put a medium numberof disk on all nodes, rather than some with more and some with less?That's the optimal configuration for Hadoop, since it best distributesdata among computing nodes.
Doug

Jason Venner wrote:
We are starting to build larger clusters, and want to betterunderstand how to configure the network topology.Up to now we have just been setting up a private vlan for the smallclusters.
We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that alsoserve DFSFor every 4-8 of the above, one compute node with a large number ofspindles with a large number of disks, to bulk out th DFS capacity.
We are wondering what the best practices are for network topology inclusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Reply via email to