That explains. The key/value objects are reused through the cycle of recordreader.read and mapper calls. The MapWritable reader perhaps does not reset the MapWritable object passed to it.
Runping > -----Original Message----- > From: Mike Forrest [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 10, 2008 3:20 PM > To: hadoop-user@lucene.apache.org > Subject: Re: problem with IdentityMapper > > I'm using Text for the keys and MapWritable for the values. > > Joydeep Sen Sarma wrote: > > what are the key value types in the Sequencefile? > > > > seems that the maprunner calls createKey and createValue > just once. so if the value serializes out it's entire memory > allocated (and not what it last read) - it would cause this problem. > > > > (I have periodically shot myself in the foot with this bullet). > > > > ________________________________ > > > > From: Mike Forrest [mailto:[EMAIL PROTECTED] > > Sent: Thu 1/10/2008 2:51 PM > > To: hadoop-user@lucene.apache.org > > Subject: problem with IdentityMapper > > > > > > > > Hi, > > I'm running into a problem where IdentityMapper seems to > produce way > > too much data. For example, I have a job that reads a > sequence file > > using IdentityMapper and then uses IdentityReducer to write > everything > > back out to another sequence file. My input is a ~60MB > sequence file > > and after the map phase has completed, the job tracker UI reports > > about 10GB for "Map output bytes". It seems like the > output collector > > does not get properly reset and so each map that gets > emitted has the > > correct key but the value ends up being all the data you've > > encountered up to that point. I think this is a known issue but I > > can't seem to find any discussion about it right now. Has > anyone else > > run into this, and if so, is there a solution? I'm using > the latest code in the 0.15 branch. > > Thanks > > Mike > > > > > > > > > >