Raghu Angadi wrote:
As hack, you could tunnel NN traffic from GridFTP clients through a
different machine (by changing fs.default.name).
Alternately these
clients could use a socks proxy.
Socks proxy would not be useful since you don't want datanode traffic to
go through the proxy.
Raghu.
The amount of traffic to NN is not much and tunneling should not affect
performance.
Raghu.
Brian Bockelman wrote:
Hey all,
Had a problem I wanted to ask advice on. The Caltech site I work with
currently have a few GridFTP servers which are on the same physical
machines as the Hadoop datanodes, and a few that aren't. The GridFTP
server has a libhdfs backend which writes incoming network data into
HDFS.
They've found that the GridFTP servers which are co-located with HDFS
datanode have poor performance because data is incoming at a much
faster rate than the HDD can handle. The standalone GridFTP servers,
however, push data out to multiple nodes at one, and can handle the
incoming data just fine (>200MB/s).
Is there any way to turn off the preference for the local node? Can
anyone think of a good workaround to trick HDFS into thinking the
client isn't on the same node?
Brian