Michael Thomas wrote:
Hey guys,

During the SC09 exercise, our data transfer tool was using the FUSE interface to HDFS. As Brian said, we were also reading 16 files in parallel. This seemed to be the optimal number, beyond which the aggregate read rate did not improve.

We have worked scheduled to modify our data transfer tool to use the native hadoop java APIs, as well as running some additional tests offline to see if the HDFS-FUSE interface is the bottleneck as we suspect.

Regards,

--Mike

Was this all local data?

IN Russ Perry's little paper "High Speed Raster Image Streaming For Digital Presses Using the Hadoop File System", he got 4Gb/s over the LAN by having a client app deciding which datanode to pull each block from, rather than having the NN tell them which node to ask for which block

"Measured stream rates approaching 4Gb/s were achieved which is close to the required rate for streaming pages containing rich designs to a digital press. This required only a minor extension to the Hadoop client to allow file blocks to be read in parallel from the Hadoop data nodes."

http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html

Reply via email to