Sandy Ryza created SPARK-1632:
---------------------------------

             Summary: Avoid boxing in ExternalAppendOnlyMap.KCComparator
                 Key: SPARK-1632
                 URL: https://issues.apache.org/jira/browse/SPARK-1632
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 0.9.0
            Reporter: Sandy Ryza
            Assignee: Sandy Ryza


Hitting an OOME in ExternalAppendOnlyMap.KCComparator while boxing an int.  I 
don't know if this is the root cause, but the boxing is also avoidable.

Code:
{code}
    def compare(kc1: (K, C), kc2: (K, C)): Int = {
      kc1._1.hashCode().compareTo(kc2._1.hashCode())
    }
{code}

Error:
{code}
java.lang.OutOfMemoryError: GC overhead limit exceeded
     at java.lang.Integer.valueOf(Integer.java:642)
     at scala.Predef$.int2Integer(Predef.scala:370)
     at 
org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:432)
     at 
org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:430)
     at 
org.apache.spark.util.collection.AppendOnlyMap$$anon$3.compare(AppendOnlyMap.scala:271)
     at java.util.TimSort.mergeLo(TimSort.java:687)
     at java.util.TimSort.mergeAt(TimSort.java:483)
     at java.util.TimSort.mergeCollapse(TimSort.java:410)
     at java.util.TimSort.sort(TimSort.java:214)
     at java.util.Arrays.sort(Arrays.java:727)
     at 
org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:274)
     at 
org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:188)
     at 
org.apache.spark.util.collection.ExternalAppendOnlyMap.insert(ExternalAppendOnlyMap.scala:141)
     at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
     at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
     at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
     at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
     at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
     at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
     at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
     at org.apache.spark.scheduler.Task.run(Task.scala:53)
     at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
     at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
     at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:415)
     at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
     at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to