Re: Circumventing Hadoop's data placement policy

Tom White Sat, 23 May 2009 12:56:29 -0700

You can't use it yet, but
https://issues.apache.org/jira/browse/HADOOP-3799 (Design a pluggable
interface to place replicas of blocks in HDFS) would enable you to
write your own policy so blocks are never placed locally. Might be
worth following its development to check it can meet your need?


Cheers,
Tom

On Sat, May 23, 2009 at 8:06 PM, jason hadoop <jason.had...@gmail.com> wrote:
> Can you give your machines multiple IP addresses, and bind the grid server
> to a different IP than the datanode
> With solaris you could put it in a different zone,
>
> On Sat, May 23, 2009 at 10:13 AM, Brian Bockelman 
> <bbock...@math.unl.edu>wrote:
>
>> Hey all,
>>
>> Had a problem I wanted to ask advice on.  The Caltech site I work with
>> currently have a few GridFTP servers which are on the same physical machines
>> as the Hadoop datanodes, and a few that aren't.  The GridFTP server has a
>> libhdfs backend which writes incoming network data into HDFS.
>>
>> They've found that the GridFTP servers which are co-located with HDFS
>> datanode have poor performance because data is incoming at a much faster
>> rate than the HDD can handle.  The standalone GridFTP servers, however, push
>> data out to multiple nodes at one, and can handle the incoming data just
>> fine (>200MB/s).
>>
>> Is there any way to turn off the preference for the local node?  Can anyone
>> think of a good workaround to trick HDFS into thinking the client isn't on
>> the same node?
>>
>> Brian
>
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: Circumventing Hadoop's data placement policy

Reply via email to