Raghu Angadi wrote:
As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name).

Alternately these clients could use a socks proxy.

Socks proxy would not be useful since you don't want datanode traffic to go through the proxy.

Raghu.

The amount of traffic to NN is not much and tunneling should not affect performance.

Raghu.

Brian Bockelman wrote:
Hey all,

Had a problem I wanted to ask advice on. The Caltech site I work with currently have a few GridFTP servers which are on the same physical machines as the Hadoop datanodes, and a few that aren't. The GridFTP server has a libhdfs backend which writes incoming network data into HDFS.

They've found that the GridFTP servers which are co-located with HDFS datanode have poor performance because data is incoming at a much faster rate than the HDD can handle. The standalone GridFTP servers, however, push data out to multiple nodes at one, and can handle the incoming data just fine (>200MB/s).

Is there any way to turn off the preference for the local node? Can anyone think of a good workaround to trick HDFS into thinking the client isn't on the same node?

Brian



Reply via email to