Cheolsoo Park created PIG-3466:
----------------------------------
Summary: Race Conditions in InternalDistinctBag during proactive
spill
Key: PIG-3466
URL: https://issues.apache.org/jira/browse/PIG-3466
Project: Pig
Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Fix For: 0.12
I have several jobs that use the following pattern:
{code}
b = group a by x;
c = foreach b {
dist_y = DISTINCT a.y;
generate
group,
COUNT(dist_y) as y_cnt;
};
{code}
These job fail intermittently during proactive spill when the data set is
large:
{code}
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
at
org.apache.pig.data.SortedSpillBag.proactive_spill(SortedSpillBag.java:77)
at
org.apache.pig.data.InternalDistinctBag.spill(InternalDistinctBag.java:464)
at
org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:274)
at
sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at
sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
at sun.management.Sensor.trigger(Sensor.java:120)
{code}
PIG-3212 fixed the same issue for *InternalSortedBag* by synchronizing accesses
to the content of bag. But *InternalDistinctBag* wasn't fixed, so the issue
remains for nested DISTINCT.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira