Cheolsoo Park created PIG-3319:
----------------------------------
Summary: Race condition in POStream
Key: PIG-3319
URL: https://issues.apache.org/jira/browse/PIG-3319
Project: Pig
Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Fix For: 0.12
When LOAD is immediately followed by STREAM, Pig job intermittently fails with
either ConcurrentModificationException or IndexOutOfBoundsException.
{code}
a = LOAD '<input>' USING MyLoadFunc();
b = STREAM a THROUGH dummy AS (foo:chararray);
DUMP b;
{code}
The problem is that if the LoadFunc creates a new tuple using
TupleFactory.newTupleNoCopy, the fields list object is reused, and it can be
concurrently modified by ProcessInputThread and POStream.
{code}
/**
* Create a tuple from a provided list of objects, keeping the provided
* list. The new tuple will take over ownership of the provided list.
* @param list List of objects that will become the fields of the tuple.
* @return A tuple with the list objects as its fields
*/
public abstract Tuple newTupleNoCopy(List list);
{code}
Here is an example:
# LoadFunc loads a line and creates a new tuple using List<Object> L.
# POStream passes it to the ProcessInputThread of ExecutableManager.
# ProcessInputThread starts iterating L to serialize it before feeding it to
the sub-process.
# LoadFunc loads another line and creates a new tuple by re-using L.
# ConcurrentModificationException is thrown because L is modified while being
iterated.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira