Andrew - I think Spark is using Guava 14... are you using Guava 16 in your user app (i.e. you inverted the versions in your earlier e-mail)?
- Patrick On Fri, Aug 1, 2014 at 4:15 PM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote: > On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash <and...@andrewash.com> wrote: > > After several days of debugging, we think the issue is that we have > > conflicting versions of Guava. Our application was running with Guava 14 > > and the Spark services (Master, Workers, Executors) had Guava 16. We had > > custom Kryo serializers for Guava's ImmutableLists, and commenting out > > those register calls did the trick. > > > > Have people had issues with Guava version mismatches in the past? > > There's some discussion about dealing with Guava version issues in > Spark in SPARK-2420. > > best, > Colin > > > > > > I've found @srowen's Guava 14 -> 11 downgrade PR here > > https://github.com/apache/spark/pull/1610 and some extended discussion > on > > https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility > > > > > > On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash <and...@andrewash.com> > wrote: > > > >> Hi everyone, > >> > >> I'm seeing the below exception coming out of Spark 1.0.1 when I call it > >> from my application. I can't share the source to that application, but > the > >> quick gist is that it uses Spark's Java APIs to read from Avro files in > >> HDFS, do processing, and write back to Avro files. It does this by > >> receiving a REST call, then spinning up a new JVM as the driver > application > >> that connects to Spark. I'm using CDH4.4.0 and have enabled Kryo and > also > >> speculation. The cluster is running in standalone mode on a 6 node > cluster > >> in AWS (not using Spark's EC2 scripts though). > >> > >> The below stacktraces are reliably reproduceable on every run of the > job. > >> The issue seems to be that on deserialization of a task result on the > >> driver, Kryo spits up while reading the ClassManifest. > >> > >> I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some > >> backcompat issues) but had the same error. > >> > >> Any ideas on what can be done here? > >> > >> Thanks! > >> Andrew > >> > >> > >> > >> In the driver (Kryo exception while deserializing a DirectTaskResult): > >> > >> INFO | jvm 1 | 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver > >> thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while > >> getting task result > >> INFO | jvm 1 | 2014/07/30 20:52:52 | > >> com.esotericsoftware.kryo.KryoException: Buffer underflow. > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> com.esotericsoftware.kryo.io.Input.require(Input.java:156) > >> ~[kryo-2.21.jar:na] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> com.esotericsoftware.kryo.io.Input.readInt(Input.java:337) > >> ~[kryo-2.21.jar:na] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762) > >> ~[kryo-2.21.jar:na] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) > ~[kryo-2.21.jar:na] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26) > >> ~[chill_2.10-0.3.6.jar:0.3.6] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19) > >> ~[chill_2.10-0.3.6.jar:0.3.6] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > >> ~[kryo-2.21.jar:na] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147) > >> ~[spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > >> ~[spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480) > >> ~[spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316) > >> ~[spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > >> [spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > >> [spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > >> [spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) > >> [spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > >> [spark-core_2.10-1.0.1.jar:1.0.1] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> [na:1.7.0_65] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> [na:1.7.0_65] > >> INFO | jvm 1 | 2014/07/30 20:52:52 | at > >> java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] > >> > >> > >> In the DAGScheduler (job gets aborted): > >> > >> org.apache.spark.SparkException: Job aborted due to stage failure: > >> Exception while getting task result: > >> com.esotericsoftware.kryo.KryoException: Buffer underflow. > >> at org.apache.spark.scheduler.DAGScheduler.org > >> > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) > >> at > >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) > >> at > >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) > >> at > >> > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > >> at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > >> at > >> > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) > >> at > >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) > >> at > >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) > >> at scala.Option.foreach(Option.scala:236) > >> at > >> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) > >> at > >> > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) > >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > >> at akka.actor.ActorCell.invoke(ActorCell.scala:456) > >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > >> at akka.dispatch.Mailbox.run(Mailbox.scala:219) > >> at > >> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > >> at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > >> at > >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > >> at > >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > >> at > >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > >> > >> > >> In an Executor (running tasks get killed): > >> > >> 14/07/29 22:57:38 INFO broadcast.HttpBroadcast: Started reading > broadcast > >> variable 0 > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill > task > >> 153 > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill > task > >> 147 > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill > task > >> 141 > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill > task > >> 135 > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill > task > >> 150 > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill > task > >> 144 > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill > task > >> 138 > >> 14/07/29 22:57:39 INFO storage.MemoryStore: ensureFreeSpace(241733) > called > >> with curMem=0, maxMem=30870601728 > >> 14/07/29 22:57:39 INFO storage.MemoryStore: Block broadcast_0 stored as > >> values to memory (estimated size 236.1 KB, free 28.8 GB) > >> 14/07/29 22:57:39 INFO broadcast.HttpBroadcast: Reading broadcast > variable > >> 0 took 0.91790748 s > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 > >> locally > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 > >> locally > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 > >> locally > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 > >> locally > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 > >> locally > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 > >> locally > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 135 > >> org.apache.spark.TaskKilledException > >> at > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> at java.lang.Thread.run(Thread.java:745) > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 144 > >> org.apache.spark.TaskKilledException > >> at > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> at java.lang.Thread.run(Thread.java:745) > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 150 > >> org.apache.spark.TaskKilledException > >> at > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> at java.lang.Thread.run(Thread.java:745) > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 138 > >> org.apache.spark.TaskKilledException > >> at > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> at java.lang.Thread.run(Thread.java:745) > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 141 > >> org.apache.spark.TaskKilledException > >> at > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> at java.lang.Thread.run(Thread.java:745) > >> >