Cheolsoo Park created PIG-3319: ---------------------------------- Summary: Race condition in POStream Key: PIG-3319 URL: https://issues.apache.org/jira/browse/PIG-3319 Project: Pig Issue Type: Bug Affects Versions: 0.11.1 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.12
When LOAD is immediately followed by STREAM, Pig job intermittently fails with either ConcurrentModificationException or IndexOutOfBoundsException. {code} a = LOAD '<input>' USING MyLoadFunc(); b = STREAM a THROUGH dummy AS (foo:chararray); DUMP b; {code} The problem is that if the LoadFunc creates a new tuple using TupleFactory.newTupleNoCopy, the fields list object is reused, and it can be concurrently modified by ProcessInputThread and POStream. {code} /** * Create a tuple from a provided list of objects, keeping the provided * list. The new tuple will take over ownership of the provided list. * @param list List of objects that will become the fields of the tuple. * @return A tuple with the list objects as its fields */ public abstract Tuple newTupleNoCopy(List list); {code} Here is an example: # LoadFunc loads a line and creates a new tuple using List<Object> L. # POStream passes it to the ProcessInputThread of ExecutableManager. # ProcessInputThread starts iterating L to serialize it before feeding it to the sub-process. # LoadFunc loads another line and creates a new tuple by re-using L. # ConcurrentModificationException is thrown because L is modified while being iterated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira