[
https://issues.apache.org/jira/browse/SPARK-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell resolved SPARK-1632.
------------------------------------
Resolution: Fixed
Fix Version/s: 1.0.0
> Avoid boxing in ExternalAppendOnlyMap compares
> ----------------------------------------------
>
> Key: SPARK-1632
> URL: https://issues.apache.org/jira/browse/SPARK-1632
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 0.9.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Fix For: 1.0.0
>
>
> Hitting an OOME in ExternalAppendOnlyMap.KCComparator while boxing an int. I
> don't know if this is the root cause, but the boxing is also avoidable.
> Code:
> {code}
> def compare(kc1: (K, C), kc2: (K, C)): Int = {
> kc1._1.hashCode().compareTo(kc2._1.hashCode())
> }
> {code}
> Error:
> {code}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.lang.Integer.valueOf(Integer.java:642)
> at scala.Predef$.int2Integer(Predef.scala:370)
> at
> org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:432)
> at
> org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:430)
> at
> org.apache.spark.util.collection.AppendOnlyMap$$anon$3.compare(AppendOnlyMap.scala:271)
> at java.util.TimSort.mergeLo(TimSort.java:687)
> at java.util.TimSort.mergeAt(TimSort.java:483)
> at java.util.TimSort.mergeCollapse(TimSort.java:410)
> at java.util.TimSort.sort(TimSort.java:214)
> at java.util.Arrays.sort(Arrays.java:727)
> at
> org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:274)
> at
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:188)
> at
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insert(ExternalAppendOnlyMap.scala:141)
> at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
> at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
> at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
> at org.apache.spark.scheduler.Task.run(Task.scala:53)
> at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)