Hi Folks,
I'm currently implementing a custom Pig loader. For each record read
from the underlying record reader I'm eagerly converting that data to Pig
data types (Tuples/Bag/Map). I've seen the latest AvroStorage under
pig/builtin does not eagerly convert the data to Pig types rather it
implements Pig's Tuple/Map/Bag interfaces to minimize conversion
overhead. I was thinking of doing the same though I have some questions.
I see the the Avro[Tuple|Bag|Map]Wrapper classes are not readonly, i.e they
implement Tuple.(set|append) and DataBag.add . Is this necessary, can I
make my custom Tuple/Bag implementations readonly ?
Also, the AvroBagWrapper does not currently extends the DefaultAbstractBag
hence it cannot register with the SpillManager. It is unlikely that an
AvroBagWrapper would grow very large, but could it be that the Pig
framework reused the AvroBagWrapper object and add elements to it making it
grow though this bag could never spill. And if this cannot happen then
effectively the AvroBagWrapper is treated as readonly and if so shouldn't
we make it concrete by not implementing the DataBag.add method?
Thankyou,
R.