Hi,

I have a question about how to efficiently access multiple files during the
Reduce phase.  The reducer gets a <key, list of values> where each key is a
different file and the value represents where to look in the file.  The
files are actually .png images.

I have tried using the DistributedCache, where I copy all the files to the
HDFS and then during the Reduce phase, I look in 

Path [] localFiles = DistributedCache.getLocalCacheFiles(configuration);

and then I pick the appropriate path of the file I need from localFiles and
process it.  However, I'm noticing that it's taking a long time to copy the
files to the HDFS.  I'm wondering if it'd be better to leave the files on
the local file system and then during the Reduce phase, open the file
directly.  I don't know if this is possible though.

In general, I'm wondering how to efficiently access multiple files during
either the Map/Reduce phase?  Is DistributedCache the best way?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/accessing-multiple-files-in-Reducer-tp23413154p23413154.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to