If you are referring the iterable in the reducer, they are special and not in the memory at all. Once the iterator pass a value, it is lost and you cannot recover it. There is nothing like linkedlist in behind.
Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac "Berry, Matt" <mwbe...@amazon.com> 06/29/2012 01:06 PM Please respond to mapreduce-user@hadoop.apache.org To "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org> cc Subject RE: Map Reduce Theory Question, getting OutOfMemoryError while reducing I was actually quite curious as to how Hadoop was managing to get all of the records into the Iterable in the first place. I thought they were using a very specialized object that implements Iterable, but a heap dump shows they're likely just using a LinkedList. All I was doing was duplicating that object. Supposing I do as you suggest, am I in danger of having their list consume all the memory if a user decides to log 2x or 3x as much as they did this time? ~Matt -----Original Message----- From: Harsh J [mailto:ha...@cloudera.com] Sent: Friday, June 29, 2012 6:52 AM To: mapreduce-user@hadoop.apache.org Subject: Re: Map Reduce Theory Question, getting OutOfMemoryError while reducing Hey Matt, As far as I can tell, Hadoop isn't at fault here truly. If your issue is that you collect in a list before you store, you should focus on that and just avoid collecting it completely. Why don't you serialize as you receive, if the incoming order is already taken care of? As far as I can tell, your AggregateRecords probably does nothing else but serialize the stored LinkedList. So instead of using a LinkedList, or even a composed Writable such as AggregateRecords, just write them in as you receive them via each .next(). Would this not work for you? You may batch a constant bit to gain some write performance but at least you won't have to use up your memory. You can serialize as you receive by following this: http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F -- Harsh J