Dear Developers, I'm limited in using Spark 1.0.2 currently.
I use Spark SQL on Hive table to load amplab benchmark, which is 25.6GiB approximately. I run: CREATE EXTERNAL TABLE uservisits (sourceIP STRING,destURL STRING, visitDate STRING,adRevenue DOUBLE,userAgent STRING,countryCode STRING, languageCode STRING,searchWord STRING,duration INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\001" STORED AS SEQUENCEFILE LOCATION "/public/xxxx/data/uservisits" okay! I run: SELECT COUNT(*) from uservisits okay! the result is correct but when I run: SELECT SUBSTR(sourceIP, 1, 8), SUM(adRevenue) FROM uservisits GROUP BY SUBSTR(sourceIP, 1, 8) There are some error messages (i stronger and underline some important message) mainly two problems: akka => Timed out GC => Out of memory what should I do? ... 14/10/05 23:45:18 INFO MemoryStore: Block broadcast_2 of size 158188 dropped from memory (free 308752285) 14/10/05 23:45:40 ERROR BlockManagerMaster: Failed to remove shuffle 4 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:45:47 ERROR BlockManagerMaster: Failed to remove shuffle 0 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:45:46 ERROR BlockManagerMaster: Failed to remove shuffle 2 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:45:45 ERROR BlockManagerMaster: Failed to remove shuffle 1 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:45:40 ERROR BlockManagerMaster: Failed to remove shuffle 3 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:46:31 ERROR Executor: Exception in task ID 5280 java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.sql.catalyst.expressions.SumFunction.<init>(aggregates.scala:351) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:243) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:230) at org.apache.spark.sql.execution.Aggregate.org$apache$spark$sql$execution$Aggregate$$newAggregateBuffer(Aggregate.scala:99) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:46:53 ERROR Executor: Exception in task ID 5270 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.String.<init>(String.java:201) at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:567) at java.nio.CharBuffer.toString(CharBuffer.java:1241) at org.apache.hadoop.io.Text.decode(Text.java:350) at org.apache.hadoop.io.Text.decode(Text.java:327) at org.apache.hadoop.io.Text.toString(Text.java:254) at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:52) at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:28) at org.apache.spark.sql.hive.HiveInspectors$class.unwrapData(hiveUdfs.scala:287) at org.apache.spark.sql.hive.execution.HiveTableScan.unwrapData(HiveTableScan.scala:48) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:101) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:99) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:203) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:200) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:159) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 14/10/05 23:47:03 INFO ShuffleBlockManager: Deleted all files for shuffle 4 14/10/05 23:47:05 INFO ShuffleBlockManager: Deleted all files for shuffle 1 14/10/05 23:47:04 INFO ShuffleBlockManager: Deleted all files for shuffle 2 14/10/05 23:47:03 INFO ShuffleBlockManager: Deleted all files for shuffle 3 14/10/05 23:47:03 INFO ShuffleBlockManager: Deleted all files for shuffle 0 14/10/05 23:47:09 INFO TaskSetManager: Starting task 14.0:32 as TID 5290 on executor localhost: localhost (PROCESS_LOCAL) 14/10/05 23:47:17 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-92,5,main] java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.String.<init>(String.java:201) at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:567) at java.nio.CharBuffer.toString(CharBuffer.java:1241) at org.apache.hadoop.io.Text.decode(Text.java:350) at org.apache.hadoop.io.Text.decode(Text.java:327) at org.apache.hadoop.io.Text.toString(Text.java:254) at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:52) at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:28) at org.apache.spark.sql.hive.HiveInspectors$class.unwrapData(hiveUdfs.scala:287) at org.apache.spark.sql.hive.execution.HiveTableScan.unwrapData(HiveTableScan.scala:48) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:101) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:99) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:203) at org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:200) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:159) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 14/10/05 23:47:22 ERROR Executor: Exception in task ID 5267 java.lang.OutOfMemoryError: GC overhead limit exceeded at scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:164) at scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45) at scala.collection.SeqLike$$anonfun$distinct$1.apply(SeqLike.scala:495) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.SeqLike$class.distinct(SeqLike.scala:493) at scala.collection.AbstractSeq.distinct(Seq.scala:40) at org.apache.spark.sql.catalyst.expressions.Coalesce.resolved$lzycompute(nullFunctions.scala:34) at org.apache.spark.sql.catalyst.expressions.Coalesce.resolved(nullFunctions.scala:34) at org.apache.spark.sql.catalyst.expressions.Coalesce.dataType(nullFunctions.scala:38) at org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:100) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:58) at org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:72) at org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:358) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:169) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:47:17 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-100,5,main] java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.sql.catalyst.expressions.SumFunction.<init>(aggregates.scala:351) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:243) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:230) at org.apache.spark.sql.execution.Aggregate.org$apache$spark$sql$execution$Aggregate$$newAggregateBuffer(Aggregate.scala:99) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:47:31 INFO TaskSetManager: Serialized task 14.0:32 as 4996 bytes in 13419 ms 14/10/05 23:47:52 ERROR Executor: Exception in task ID 5288 java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.sql.catalyst.expressions.SumFunction.<init>(aggregates.scala:353) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:243) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:230) at org.apache.spark.sql.execution.Aggregate.org$apache$spark$sql$execution$Aggregate$$newAggregateBuffer(Aggregate.scala:99) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:47:52 INFO Executor: Running task ID 5290 14/10/05 23:47:52 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-97,5,main] java.lang.OutOfMemoryError: GC overhead limit exceeded at scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:164) at scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45) at scala.collection.SeqLike$$anonfun$distinct$1.apply(SeqLike.scala:495) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.SeqLike$class.distinct(SeqLike.scala:493) at scala.collection.AbstractSeq.distinct(Seq.scala:40) at org.apache.spark.sql.catalyst.expressions.Coalesce.resolved$lzycompute(nullFunctions.scala:34) at org.apache.spark.sql.catalyst.expressions.Coalesce.resolved(nullFunctions.scala:34) at org.apache.spark.sql.catalyst.expressions.Coalesce.dataType(nullFunctions.scala:38) at org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:100) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:58) at org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:72) at org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:358) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:169) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14/10/05 23:47:52 INFO TaskSetManager: Starting task 14.0:33 as TID 5291 on executor localhost: localhost (PROCESS_LOCAL) 14/10/05 23:47:52 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-91,5,main] java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.sql.catalyst.expressions.SumFunction.<init>(aggregates.scala:353) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:243) at org.apache.spark.sql.catalyst.expressions.Sum.newInstance(aggregates.scala:230) at org.apache.spark.sql.execution.Aggregate.org$apache$spark$sql$execution$Aggregate$$newAggregateBuffer(Aggregate.scala:99) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)