Re: How 'commodity' is 'commodity'

Steve Loughran Tue, 29 Sep 2009 03:02:52 -0700

"commodity" really means x86 parts, non-RAID storage, noinfiniband-connected storage array, no esoteric OS -just Linux- andcommodity gigabit ether, nothing fancy like 10GBE except on aheavy-utilised backbone :) With those kind of configurations, you reduceyour capital costs, leaving you more money to spend on the electricitybill. I'd still go for RAID and/or NFS-mounted RAID for bits of thenamenode/2ary namenode if you care about the data.


Taeho Kang wrote:

If your "commodity" pc's don't have a whole lot of storage space, then you
would have to run your HDFS datanodes elsewhere. In that case, a lot of data
traffic will occur (e.g. sending data from datanodes to where data
processing occurs), meaning map reduce performance will be slowed down. It's
always good to have the actual data on the same machine where the processing
will occur, or there will be extra network i/o involved.

If you decide to host datanodes on pc's, then you also have to be able to
protect the data. (e.g. make sure people don't accidentally delete data
blocks.)

Well, there are lots and lots of possibilities, and I would like to hear how
your plan goes, too!

I would go for storing data off the desktop machines, and just usingthem as compute nodes -tasktrackers. This reduces the impact of themgoing offline without warning but lets them do useful work. This willbump up their bandwidth needs though.

This still leaves you with the problem of configuring the hadoop clusterfor all these machines, especially if they are different. To work aroundthat, why not creating a VirtualBox or VMWare OS image containing thehadoop binaries and configuration files. Everyone who runs the OS imagejoins the cluster, but as soon as they pause it, that tasktracker goes away.

When run Virtualized, HDD and network IO is slower, but if you are onlyconnecting to network storage, that network throttling could be useful,it will cut back on LAN bandwidth. CPU performance can often becomparable, so if your code is CPU-intensive, this can work

Re: How 'commodity' is 'commodity'

Reply via email to