Cheolsoo Park created PIG-3319:
----------------------------------

             Summary: Race condition in POStream
                 Key: PIG-3319
                 URL: https://issues.apache.org/jira/browse/PIG-3319
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.11.1
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
             Fix For: 0.12


When LOAD is immediately followed by STREAM, Pig job intermittently fails with 
either ConcurrentModificationException or IndexOutOfBoundsException. 
{code}
a = LOAD '<input>' USING MyLoadFunc();
b = STREAM a THROUGH dummy AS (foo:chararray);
DUMP b;
{code}
The problem is that if the LoadFunc creates a new tuple using 
TupleFactory.newTupleNoCopy, the fields list object is reused, and it can be 
concurrently modified by ProcessInputThread and POStream.
{code}
/**
 * Create a tuple from a provided list of objects, keeping the provided
 * list.  The new tuple will take over ownership of the provided list.
 * @param list List of objects that will become the fields of the tuple.
 * @return A tuple with the list objects as its fields
 */
public abstract Tuple newTupleNoCopy(List list);
{code}
Here is an example:
# LoadFunc loads a line and creates a new tuple using List<Object> L.
# POStream passes it to the ProcessInputThread of ExecutableManager.
# ProcessInputThread starts iterating L to serialize it before feeding it to 
the sub-process. 
# LoadFunc loads another line and creates a new tuple by re-using L.
# ConcurrentModificationException is thrown because L is modified while being 
iterated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to