Rohini Palaniswamy created PIG-3823:
---------------------------------------
Summary: Use newTupleNoCopy instead of newTuple wherever possible
to avoid object copy penalty
Key: PIG-3823
URL: https://issues.apache.org/jira/browse/PIG-3823
Project: Pig
Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
The readFields() method of all Tuple implementations do a mFields.clear() and
then load data into the same tuple. If that was changed to do
{code}
if (mFields.size() > 0) {
mFields = new ArrayList<Object>(mFields.size());
}
{code}
many places in code where we do TupleFactory.newTuple(List c); can be replaced
with TupleFactory.newTupleNoCopy(List c);. This will avoid a expensive
System.arrayCopy() call which is a native method.
POPackage.java getValueTuple()
{code}
if( keyLookupSize > 0) {
// we have some fields of the "value" in the
// "key".
int finalValueSize = keyLookupSize + val.size();
copy = mTupleFactory.newTuple(finalValueSize);
int valIndex = 0; // an index for accessing elements from
// the value (val) that we have currently
for(int i = 0; i < finalValueSize; i++) {
Integer keyIndex = keyLookup.get(i);
if(keyIndex == null) {
// the field for this index is not in the
// key - so just take it from the "value"
// we were handed
copy.set(i, val.get(valIndex));
valIndex++;
} .....
} else if (isProjectStar) {
// the whole "value" is present in the "key"
copy = mTupleFactory.newTuple(keyAsTuple.getAll());
} else {
// there is no field of the "value" in the
// "key" - so just make a copy of what we got
// as the "value"
copy = mTupleFactory.newTuple(val.getAll());
}
{code}
Some cases might take a slight hit in GC due to new ArrayList initialization.
For eg: if( keyLookupSize > 0) condition in above code as it does not do a
newTuple. But other 2 cases would greatly benefit as we can avoid arraylist
copy. Same in POFRJoin.
Will run some tests to validate the theory and post a patch.
--
This message was sent by Atlassian JIRA
(v6.2#6252)