Cheolsoo Park created PIG-3466:
----------------------------------

             Summary: Race Conditions in InternalDistinctBag during proactive 
spill
                 Key: PIG-3466
                 URL: https://issues.apache.org/jira/browse/PIG-3466
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.11.1
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
             Fix For: 0.12


I have several jobs that use the following pattern:
{code}
b = group a by x;
c = foreach b {
            dist_y = DISTINCT a.y;
            generate
            group,
            COUNT(dist_y) as y_cnt;

};
{code}
These job fail intermittently during  proactive spill when the data set is 
large:
{code}
java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
        at java.util.HashMap$KeyIterator.next(HashMap.java:828)
        at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
        at 
org.apache.pig.data.SortedSpillBag.proactive_spill(SortedSpillBag.java:77)
        at 
org.apache.pig.data.InternalDistinctBag.spill(InternalDistinctBag.java:464)
        at 
org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:274)
        at 
sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
        at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
        at 
sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
        at sun.management.Sensor.trigger(Sensor.java:120)
{code}
PIG-3212 fixed the same issue for *InternalSortedBag* by synchronizing accesses 
to the content of bag. But *InternalDistinctBag* wasn't fixed, so the issue 
remains for nested DISTINCT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to