Hadoop reuses objects as an optimization. If you need to keep a copy
in memory, you need to call clone yourself. I've never used Avro, but
my guess is that the BARs are not reused, only the FOO.

-Joey

On Wed, Aug 3, 2011 at 3:18 AM, Vyacheslav Zholudev
<vyacheslav.zholu...@gmail.com> wrote:
> Hi all,
>
> I'm using Avro as a serialization format and assume I have a generated 
> specific class FOO that I use as a Mapper output format:
>
> class FOO {
>  int a;
>  List<BAR> barList;
> }
>
> where BAR is another generated specific Java class.
>
> When I iterate over "Iterable<FOO> values" in the Reducer it is clear that 
> the same object of class FOO is reused, i.e.
> FOO foo1 = values.iterator.next();
> FOO foo2 = values.iterator.next();
> assertThat(foo1 == foo2, is (true));
>
> So I have the following questions:
> 1) Is the list barList reused over the next() calls?
> 2) If yes, can the objects that are in the barList be reused? For example, if 
> the first time next() is called, the list contains two BAR objects, the next 
> time next() is called the barList contains 3 objects and 2 of them are equal 
> by reference to the two from the list of the first next() call. In other 
> words, does Hadoop maintain some sort of "object pool"?
> 3) Why do not AvroTools  generate clone() methods since it would be quite 
> straightforward and more importantly useful given that objects are reused?
>
> Thanks a lot in advance!
>
> Vyacheslav
>
>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to