Hadoop reuses objects as an optimization. If you need to keep a copy in memory, you need to call clone yourself. I've never used Avro, but my guess is that the BARs are not reused, only the FOO.
-Joey On Wed, Aug 3, 2011 at 3:18 AM, Vyacheslav Zholudev <vyacheslav.zholu...@gmail.com> wrote: > Hi all, > > I'm using Avro as a serialization format and assume I have a generated > specific class FOO that I use as a Mapper output format: > > class FOO { > int a; > List<BAR> barList; > } > > where BAR is another generated specific Java class. > > When I iterate over "Iterable<FOO> values" in the Reducer it is clear that > the same object of class FOO is reused, i.e. > FOO foo1 = values.iterator.next(); > FOO foo2 = values.iterator.next(); > assertThat(foo1 == foo2, is (true)); > > So I have the following questions: > 1) Is the list barList reused over the next() calls? > 2) If yes, can the objects that are in the barList be reused? For example, if > the first time next() is called, the list contains two BAR objects, the next > time next() is called the barList contains 3 objects and 2 of them are equal > by reference to the two from the list of the first next() call. In other > words, does Hadoop maintain some sort of "object pool"? > 3) Why do not AvroTools generate clone() methods since it would be quite > straightforward and more importantly useful given that objects are reused? > > Thanks a lot in advance! > > Vyacheslav > > > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434