Biro, I guess you could write these archives onto HDFS, and have your reducers read it from a location there, but this method may be a bit ugly. See http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F for properly writing files from tasks onto a DFS, or look at MultipleOutputs API class.
Depending on how large these files are, you can also perhaps ship them in via the KV pairs itself. A custom key or sort comparator can further ensure that they are delivered in the first iteration of the reducer - if the file is required before regular reduce() ops can begin. On Mon, May 21, 2012 at 1:42 PM, biro lehel <lehel.b...@yahoo.com> wrote: > Dear all, > > In my Mapper, I run a script that processes my set of input text files, > creates from them some other text files (this is done locally on the FS on my > nodes), and as a result, each MapTask will produce an archive as a result. My > issue is, that I'm looking for a way for the Reducer to "take" these archives > as some kind of an input. I understood that the communication between > Mapper-Reducer is done through the means of the key-value pairs in the > Context, but what I would need is the transferring of these archive files to > the respective Reducer (I would probably have one single Reducer, so all the > files should be transferred/copied there somehow). > > Is this possible? Is there a way to transfer files from Mapper to Reducer? If > not, what is the best approach in scenarios like mine? Any suggestions would > be greatly appreciated. > > Thank you in advance, > Lehel. > > > -- Harsh J