It's also worth noting that the speed gain is minimal. We're talking 4 minutes per billion rows (per mapper) at most, and the speed gain only applies to size()-max(setIndex) i.e. if you have a tuple of size 30 and then set up to 15, then yeah, you wasted time nulling out 15 fields. It's pretty minimal, though if you want to tackle making the implementation more efficient and benchmark the difference, I'm all ears :)
2012/5/27 Ashutosh Chauhan <hashut...@apache.org> > I also had this question when I was playing with DefaultTuple once. I > removed that null-addition loop, but then it failed while setting an object > at an specific index. There is a code in Pig which sets value at a specific > index in the tuple, which will throw NPE if you dont first add null in the > list. This is one of those oddities of java. > > List<String> list = new ArrayList<String>(3); > list.set(1,"mystring"); > > throws NPE, while following doesn't. > > List<String> list = new ArrayList<String>(3); > for (int i =0; i < 3; i++) { > list.add(null); > } > list.set(1,"mystring"); > > Since client code using DefaultTuple is permissible to set value at any > index of tuple, we have no choice but to null-fill the list first. > > Hope it helps, > Ashutosh > > On Sat, May 26, 2012 at 9:13 PM, Jonathan Coveney <jcove...@gmail.com > >wrote: > > > -user > > +dev > > > > Haha, I made this very same comment somewhere, and noticed the exact same > > thing (I think I mention it in my SchemaTuple benchmarking). > > > > The reason is so that tuple.size() will return the right value. Also, the > > expectation is that if you append, it goes to the end of all of the > nulls, > > not the first position. It's a little confusing, and yeah, it surprised > me > > too. > > > > You could definitely amortize the cost of creation over the sets that the > > user does by keeping an index, but when I first saw it I decided that the > > (slightly) increased memory footprint and the increase in code complexity > > wasn't worth a very minimal increase. > > > > 2012/5/26 Prashant Kommireddi <prash1...@gmail.com> > > > > > I rambled across this while reviewing one of Jon's patches. Here is the > > > code from DefaultTuple > > > > > > /** > > > * Construct a tuple with a known number of fields. Package level so > > > that callers cannot directly invoke it. > > > * <br>Resulting tuple is filled pre-filled with null elements. Time > > > complexity: O(N), after allocation > > > * > > > * @param size > > > * Number of fields to allocate in the tuple. > > > */ > > > DefaultTuple(int size) { > > > mFields = new ArrayList<Object>(size); > > > for (int i = 0; i < size; i++) > > > mFields.add(null); > > > } > > > > > > > > > Why are we walking through the list to add nulls? Wouldn't the initial > > > creation of ArrayList suffice? > > > mFields = new ArrayList<Object>(size) should be enough. > > > > > > Thanks, > > > Prashant > > > > > >