Your request sounds very strange.

First off, different map objects are created on different machines (that IS
the point, after all) and thus any reading of data has to be done on at
least all of those machines.  The map object is only created once per split,
though, so that might be a bit more what you are getting at.

Your basic requirement is a little odd, however, since you say that the
input to all of the maps is the same.  What is the point of parallelism in
that case?  Are your maps random in some sense?  Are they really operating
on different parts of the single input?  If so, shouldn't they just be
getting the part of the input that they will be working on?


Perhaps you should describe what you are trying to do at a higher level.  It
really sounds like you have taken a bit of an odd turn somewhere in your
porting your algorithm to a parallel form.


On 3/12/08 9:24 AM, "Prasan Ary" <[EMAIL PROTECTED]> wrote:

> I have a very large xml file as input and a  couple of Map/Reduce functions.
> Input key/value pair to all of my map functions is the same.
>   I was wondering if there is a way that I read the input xml file only once,
> then create key/value pair (also once) and give these k/v pairs as input to my
> map functions as opposed to having to read the xml and generate key/value once
> for each map functions?
>    
>   thanks.
>    
> 
>        
> ---------------------------------
> Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Reply via email to