Be careful putting them in HDFS.  It does not scale very well, as the number of 
file opens will be on the order of Number of Mappers * Number of Reducers.  You 
can quickly do a denial of service on the namenode if you have a lot of mappers 
and reducers.

--Bobby Evans

On 5/21/12 4:02 AM, "Harsh J" <ha...@cloudera.com> wrote:

Biro,

I guess you could write these archives onto HDFS, and have your
reducers read it from a location there, but this method may be a bit
ugly. See 
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
for properly writing files from tasks onto a DFS, or look at
MultipleOutputs API class.

Depending on how large these files are, you can also perhaps ship them
in via the KV pairs itself. A custom key or sort comparator can
further ensure that they are delivered in the first iteration of the
reducer - if the file is required before regular reduce() ops can
begin.

On Mon, May 21, 2012 at 1:42 PM, biro lehel <lehel.b...@yahoo.com> wrote:
> Dear all,
>
> In my Mapper, I run a script that processes my set of input text files, 
> creates from them some other text files (this is done locally on the FS on my 
> nodes), and as a result, each MapTask will produce an archive as a result. My 
> issue is, that I'm looking for a way for the Reducer to "take" these archives 
> as some kind of an input. I understood that the communication between 
> Mapper-Reducer is done through the means of the key-value pairs in the 
> Context, but what I would need is the transferring of these archive files to 
> the respective Reducer (I would probably have one single Reducer, so all the 
> files should be transferred/copied there somehow).
>
> Is this possible? Is there a way to transfer files from Mapper to Reducer? If 
> not, what is the best approach in scenarios like mine? Any suggestions would 
> be greatly appreciated.
>
> Thank you in advance,
> Lehel.
>
>
>



--
Harsh J

Reply via email to