[
https://issues.apache.org/jira/browse/PIG-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-3466:
-------------------------------
Attachment: PIG-3466-1.patch
Attached is a patch that applies the same fix to InternalDistinctBag as what
PIG-3212 did to InternalSortedBag.
All unit tests pass.
> Race Conditions in InternalDistinctBag during proactive spill
> -------------------------------------------------------------
>
> Key: PIG-3466
> URL: https://issues.apache.org/jira/browse/PIG-3466
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3466-1.patch
>
>
> I have several jobs that use the following pattern:
> {code}
> b = group a by x;
> c = foreach b {
> dist_y = DISTINCT a.y;
> generate
> group,
> COUNT(dist_y) as y_cnt;
> };
> {code}
> These job fail intermittently during proactive spill when the data set is
> large:
> {code}
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> at java.util.HashMap$KeyIterator.next(HashMap.java:828)
> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
> at
> org.apache.pig.data.SortedSpillBag.proactive_spill(SortedSpillBag.java:77)
> at
> org.apache.pig.data.InternalDistinctBag.spill(InternalDistinctBag.java:464)
> at
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:274)
> at
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at
> sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
> at sun.management.Sensor.trigger(Sensor.java:120)
> {code}
> PIG-3212 fixed the same issue for *InternalSortedBag* by synchronizing
> accesses to the content of bag. But *InternalDistinctBag* wasn't fixed, so
> the issue remains for nested DISTINCT.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira