[ 
https://issues.apache.org/jira/browse/SPARK-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286906#comment-15286906
 ] 

Sital Kedia commented on SPARK-13850:
-------------------------------------

I am not 100% sure of the root cause, but I suspect this is happening when JVM 
is trying to allocate very large size buffer for pointer array. The issue might 
be because the JVM is not able to allocate large buffer in contiguous memory 
location on heap and since the unsafe operations assume contiguous memory 
location of the objects, any unsafe operation on large buffer results in memory 
corruption which manifests as TimSort issue.

> TimSort Comparison method violates its general contract
> -------------------------------------------------------
>
>                 Key: SPARK-13850
>                 URL: https://issues.apache.org/jira/browse/SPARK-13850
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.6.0
>            Reporter: Sital Kedia
>
> While running a query which does a group by on a large dataset, the query 
> fails with following stack trace. 
> {code}
> Job aborted due to stage failure: Task 4077 in stage 1.3 failed 4 times, most 
> recent failure: Lost task 4077.3 in stage 1.3 (TID 88702, 
> hadoop3030.prn2.facebook.com): java.lang.IllegalArgumentException: Comparison 
> method violates its general contract!
>       at 
> org.apache.spark.util.collection.TimSort$SortState.mergeLo(TimSort.java:794)
>       at 
> org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:525)
>       at 
> org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:453)
>       at 
> org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
>       at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
>       at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>       at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:228)
>       at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:186)
>       at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:175)
>       at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
>       at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
>       at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:318)
>       at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:333)
>       at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
>       at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
>       at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
>       at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> Please note that the same query used to succeed in Spark 1.5 so it seems like 
> a regression in 1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to