Michael Thomas wrote:
Hey guys,
During the SC09 exercise, our data transfer tool was using the FUSE
interface to HDFS. As Brian said, we were also reading 16 files in
parallel. This seemed to be the optimal number, beyond which the
aggregate read rate did not improve.
We have worked scheduled to modify our data transfer tool to use the
native hadoop java APIs, as well as running some additional tests
offline to see if the HDFS-FUSE interface is the bottleneck as we suspect.
Regards,
--Mike
Was this all local data?
IN Russ Perry's little paper "High Speed Raster Image Streaming For
Digital Presses Using the Hadoop File System", he got 4Gb/s over the LAN
by having a client app deciding which datanode to pull each block from,
rather than having the NN tell them which node to ask for which block
"Measured stream rates approaching 4Gb/s were achieved which is close to
the required rate for streaming pages containing rich designs to a
digital press. This required only a minor extension to the Hadoop client
to allow file blocks to be read in parallel from the Hadoop data nodes."
http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html