I would concur that it is much better to have sufficient storage in the
compute farm for DFS files to be local for the compute tasks.

Also, a 16 disk machine typically costs a good bit more than a 6 disk
machine + 10 disks because you usually require a second chassis.  Sun's
Thumper would be an interesting counter-example of this.

I have found (in my limited experience) that you want as many disk
controllers as you can get and that you want the disk as close to your
compute power as possible.  For me, that means that my ideal machine is a
moderate CPU or two attached to 1-3 TB of storage.  My smallest machines
have slow CPU with two SATA drives (could be 2 x 500GB, but mostly are 500GB
+ 73GB for historical reasons).  These machines can be had for <$500
second-hand and <$1000 new from reputable vendors.  My larger machines have
6 disks and dual Xeons, but cost about $3-4K and only have about twice the
net Hadoop throughput and take up twice the rack space.  I would *much*
rather have 6 times as many of the little boxes.


On 2/12/08 1:01 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

>> From my reading, I conjecture that an ideal configuration would be 1
>> local disk per cpu for local data/reducing, and some number of separate
>> disks for dfs.
>> Is this an accurate assessment?
> 
> DFS storage is typically local on compute nodes.

Reply via email to