"commodity" really means x86 parts, non-RAID storage, no infiniband-connected storage array, no esoteric OS -just Linux- and commodity gigabit ether, nothing fancy like 10GBE except on a heavy-utilised backbone :) With those kind of configurations, you reduce your capital costs, leaving you more money to spend on the electricity bill. I'd still go for RAID and/or NFS-mounted RAID for bits of the namenode/2ary namenode if you care about the data.

Taeho Kang wrote:
If your "commodity" pc's don't have a whole lot of storage space, then you
would have to run your HDFS datanodes elsewhere. In that case, a lot of data
traffic will occur (e.g. sending data from datanodes to where data
processing occurs), meaning map reduce performance will be slowed down. It's
always good to have the actual data on the same machine where the processing
will occur, or there will be extra network i/o involved.

If you decide to host datanodes on pc's, then you also have to be able to
protect the data. (e.g. make sure people don't accidentally delete data
blocks.)

Well, there are lots and lots of possibilities, and I would like to hear how
your plan goes, too!

I would go for storing data off the desktop machines, and just using them as compute nodes -tasktrackers. This reduces the impact of them going offline without warning but lets them do useful work. This will bump up their bandwidth needs though.

This still leaves you with the problem of configuring the hadoop cluster for all these machines, especially if they are different. To work around that, why not creating a VirtualBox or VMWare OS image containing the hadoop binaries and configuration files. Everyone who runs the OS image joins the cluster, but as soon as they pause it, that tasktracker goes away.

When run Virtualized, HDD and network IO is slower, but if you are only connecting to network storage, that network throttling could be useful, it will cut back on LAN bandwidth. CPU performance can often be comparable, so if your code is CPU-intensive, this can work

Reply via email to