Rohini Palaniswamy created PIG-3823:
---------------------------------------

             Summary: Use newTupleNoCopy instead of newTuple wherever possible 
to avoid object copy penalty
                 Key: PIG-3823
                 URL: https://issues.apache.org/jira/browse/PIG-3823
             Project: Pig
          Issue Type: Bug
            Reporter: Rohini Palaniswamy
            Assignee: Rohini Palaniswamy


The readFields() method of all Tuple implementations do a mFields.clear() and 
then load data into the same tuple. If that was changed to do 

{code}
if (mFields.size() > 0) {
            mFields = new ArrayList<Object>(mFields.size());
        }
{code}

many places in code where we do TupleFactory.newTuple(List c); can be replaced 
with TupleFactory.newTupleNoCopy(List c);. This will avoid a expensive 
System.arrayCopy() call which is a native method. 

POPackage.java getValueTuple()
{code}
if( keyLookupSize > 0) {

            // we have some fields of the "value" in the
            // "key".
            int finalValueSize = keyLookupSize + val.size();
            copy = mTupleFactory.newTuple(finalValueSize);
            int valIndex = 0; // an index for accessing elements from
                              // the value (val) that we have currently
            for(int i = 0; i < finalValueSize; i++) {
                Integer keyIndex = keyLookup.get(i);
                if(keyIndex == null) {
                    // the field for this index is not in the
                    // key - so just take it from the "value"
                    // we were handed
                    copy.set(i, val.get(valIndex));
                    valIndex++;
                } .....
        } else if (isProjectStar) {
            // the whole "value" is present in the "key"
            copy = mTupleFactory.newTuple(keyAsTuple.getAll());   
        } else {
            // there is no field of the "value" in the
            // "key" - so just make a copy of what we got
            // as the "value"
            copy = mTupleFactory.newTuple(val.getAll());
        }
{code}

Some cases might take a slight hit in GC due to new ArrayList initialization. 
For eg: if( keyLookupSize > 0) condition in above code as it does not do a 
newTuple. But other 2 cases would greatly benefit as we can avoid arraylist 
copy. Same in POFRJoin.

Will run some tests to validate the theory and post a patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to