[
https://issues.apache.org/jira/browse/PIG-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175322#comment-14175322
]
David Dreyfus commented on PIG-4012:
------------------------------------
The issue has nothing to do with unstable sorting or the fact that spillables
change in size.
Earlier versions of Java gracefully handled non stable sorting. Current
versions do not - generating the exception.
The issue is the exception that is generated. I started looking at the sun
memory management code. I didn't see anything that silently logs and ignores
exceptions, so I'll assume the job fails on this exception. Even if the
exception is caught (a big if), no memory reduction will take place when the
exception does occur.
If the whole purpose of the sorting is to identify a few spillables to spill so
as to minimize the performance impact of spilling everything, and you are
concerned about the time and memory it takes to copy the list of spillables
(not their content, obviously), you might also be concerned about the nlog(n)
time it takes to sort the list. Perhaps this should be addressed:
1) A configuration option that just causes Pig to spill everything if the
number of spillables is above a threshold.
2) A minimum spill size such that on notification we spill everything spillable
with estimated size greater than this threshold.
In both of these solutions sorting can be eliminated. It might be ideal for
those use cases were the size of the spillables list dominates memory usage.
Another approach would be to sort without copying, capture the exception, and
either keep on retrying, or then copy and sort. My guess is this would
ultimately be slower in those cases where the exception does occur.
My guess is that the memory consumed by the list is tiny relative to the size
of the spillables and the spilling we are trying to avoid, but my vision is
limited by my use cases.
> java.lang.IllegalArgumentException: Comparison method violates its general
> contract! SpillableMemoryManager
> -----------------------------------------------------------------------------------------------------------
>
> Key: PIG-4012
> URL: https://issues.apache.org/jira/browse/PIG-4012
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.12.0
> Environment: java version "1.7.0_60-ea"
> Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 24.60-b04, mixed mode)
> Reporter: David Dreyfus
> Assignee: David Dreyfus
> Fix For: 0.13.0
>
>
> java.lang.IllegalArgumentException: Comparison method violates its general
> contract!
> at java.util.TimSort.mergeHi(TimSort.java:868)
> at java.util.TimSort.mergeAt(TimSort.java:485)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
> at java.util.TimSort.sort(TimSort.java:223)
> at java.util.TimSort.sort(TimSort.java:173)
> at java.util.Arrays.sort(Arrays.java:659)
> at java.util.Collections.sort(Collections.java:217)
> at
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:199)
> at
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:156)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:168)
> at
> sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:301)
> at sun.management.Sensor.trigger(Sensor.java:137)
> From SpillableMemoryManager.java:
> /**
> * We don't lock anything, so this sort may not be stable if
> a WeakReference suddenly
> * becomes null, but it will be close enough.
> * Also between the time we sort and we use these spillables,
> they
> * may actually change in size - so this is just best effort
> */
> Issue may be due to Java 7 and reporting vs ignoring the exception.
> Trying
> -Djava.util.Arrays.useLegacyMergeSort=true
> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6804124
> suggests the newer MergeSort is much faster.
> Someone may want to make the sorting stable in SpillableMemoryManager so that
> the new merge sort can be used without failure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)