On Sep 5, 2012, at 6:30 PM, Prasanth J wrote: > Ahh.. Now it makes more sense. > > I think I got the solution. I was adding to List<Tuple> and then finally > creating a DataBag with that list.. Instead I should create a bag and keep > adding to it..!! Is that correct? Yes.
Alan. > Thanks Alan. > > Thanks > -- Prasanth > > On Sep 5, 2012, at 9:24 PM, Alan Gates <ga...@hortonworks.com> wrote: > >> You cannot modify a bag once it is written. The implementation is written >> around the assumption that bags are immutable after they are written. >> >> Creating a new bag should not create an OOM exception, as bags are built to >> spill when they grow too large. In fact it's this spilling feature that >> makes in place modification impossible. >> >> Alan. >> >> On Sep 5, 2012, at 6:08 PM, Prasanth J wrote: >> >>> Hello devs >>> >>> I have specific case where I need to modify the contents (remove a field >>> from each tuples) of Databag but I want to do it in-place and do not want >>> to create another databag with new set of tuples. >>> The situation is, say I have the following input tuple for an UDF >>> >>> {(111,222,3,121), (112,223,2,131), (113,224,4,141)} >>> >>> I want to iterate through this bag and generate an output bag removing the >>> 3rd the of each tuples in the bag to get the following output >>> {(111,222,121), (112,223,131), (113,224,141)} >>> >>> Since the number of tuples in this bag are expected to be large I cannot >>> create new set of tuples and create a bag, as this will cause OOM >>> exception. >>> >>> Also I do not want to flatten this bag as this bag will be passed to >>> DISTINCT operator for computing distinct elements in the bag. >>> As seen from the javadocs for DataBag, there is no way to convert a bag on >>> the fly. I wonder if there is any other way to solve this? >>> >>> Thanks >>> -- Prasanth >>> >> >