Re: How 'commodity' is 'commodity'

Taeho Kang Tue, 29 Sep 2009 00:55:53 -0700

If your "commodity" pc's don't have a whole lot of storage space, then you
would have to run your HDFS datanodes elsewhere. In that case, a lot of data
traffic will occur (e.g. sending data from datanodes to where data
processing occurs), meaning map reduce performance will be slowed down. It's
always good to have the actual data on the same machine where the processing
will occur, or there will be extra network i/o involved.


If you decide to host datanodes on pc's, then you also have to be able to
protect the data. (e.g. make sure people don't accidentally delete data
blocks.)

Well, there are lots and lots of possibilities, and I would like to hear how
your plan goes, too!



On Tue, Sep 29, 2009 at 12:45 PM, James Carroll <[email protected]
> wrote:

> I work in a call center which means we have a lot of PCs sitting on
> agents' desks doing a whole lot nothing in the middle of the night. It
> also means that we collect a lot of phone and other data, etc that all
> gets rolled out into reports and/or tables that drive reports or other
> processes. We're pushing the limits on what our current data processing
> can do and I'd like to pitch Hadoop/HDFS/PIG to my boss. So bottomline,
> before I go too much further: can we create a Hadoop cluster across all
> those desktop PCs, start/wake it up once every one has gone home, load
> the data, do the analysis, and then creep back into the shadows before
> anyone is the wiser? Or would the slave nodes have to be 'dedicated'
> such that they wouldn't be able to do anything other that. We'll figure
> out the capacity aspects later if I can get a Proof of Concept approved
> to at least try.  The PCs are, you guessed it, Windows machines.
>
> Thanks!
>
>

Re: How 'commodity' is 'commodity'

Reply via email to