That explains.
 
The key/value objects are reused through the cycle of recordreader.read
and mapper calls.
The MapWritable reader perhaps does not reset the MapWritable object
passed to it.

Runping
 
> -----Original Message-----
> From: Mike Forrest [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, January 10, 2008 3:20 PM
> To: hadoop-user@lucene.apache.org
> Subject: Re: problem with IdentityMapper
> 
> I'm using Text for the keys and MapWritable for the values.
> 
> Joydeep Sen Sarma wrote:
> > what are the key value types in the Sequencefile?
> >  
> > seems that the maprunner calls createKey and createValue 
> just once. so if the value serializes out it's entire memory 
> allocated (and not what it last read) - it would cause this problem.
> >  
> > (I have periodically shot myself in the foot with this bullet).
> >
> > ________________________________
> >
> > From: Mike Forrest [mailto:[EMAIL PROTECTED]
> > Sent: Thu 1/10/2008 2:51 PM
> > To: hadoop-user@lucene.apache.org
> > Subject: problem with IdentityMapper
> >
> >
> >
> > Hi,
> > I'm running into a problem where IdentityMapper seems to 
> produce way 
> > too much data.  For example, I have a job that reads a 
> sequence file 
> > using IdentityMapper and then uses IdentityReducer to write 
> everything 
> > back out to another sequence file.  My input is a ~60MB 
> sequence file 
> > and after the map phase has completed, the job tracker UI reports 
> > about 10GB for "Map output bytes".  It seems like the 
> output collector 
> > does not get properly reset and so each map that gets 
> emitted has the 
> > correct key but the value ends up being all the data you've 
> > encountered up to that point.  I think this is a known issue but I 
> > can't seem to find any discussion about it right now.  Has 
> anyone else 
> > run into this, and if so, is there a solution?  I'm using 
> the latest code in the 0.15 branch.
> > Thanks
> > Mike
> >
> >
> >
> >   
> 
> 

Reply via email to