[ https://issues.apache.org/jira/browse/SPARK-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324687#comment-14324687 ]
Shivaram Venkataraman commented on SPARK-757: --------------------------------------------- Since the shuffle implementation has changed recently I think this can be marked as obsolete > Deserialization Exception partway into long running job with Netty - MLbase > --------------------------------------------------------------------------- > > Key: SPARK-757 > URL: https://issues.apache.org/jira/browse/SPARK-757 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.8.0 > Reporter: Evan Sparks > Assignee: Shivaram Venkataraman > Attachments: imgnet_shiv6.log.gz, joblogs.tgz, newerlogs.tgz, > newworklog.log, shivlogs.tgz > > > Using Netty for communication I see some derserialization errors that are > crashing my job about 30% of the way through an iterative 10-step job. > Happens reliably around the same point of the job after multiple attempts. > Logs on master and a couple of affected workers attached per request from > Shivaram. > 13/05/31 23:19:12 INFO cluster.TaskSetManager: Serialized task 11.0:454 as > 3414 bytes in 0 ms > 13/05/31 23:19:14 INFO cluster.TaskSetManager: Finished TID 11344 in 55289 ms > (progress: 312/1000) > 13/05/31 23:19:14 INFO scheduler.DAGScheduler: Completed ResultTask(11, 344) > 13/05/31 23:19:14 INFO cluster.ClusterScheduler: > parentName:,name:TaskSet_11,runningTasks:143 > 13/05/31 23:19:14 INFO cluster.TaskSetManager: Starting task 11.0:455 as TID > 11455 on slave 8: ip-10-60-217-218.ec2.internal:56262 (NODE_LOCAL) > 13/05/31 23:19:14 INFO cluster.TaskSetManager: Serialized task 11.0:455 as > 3414 bytes in 0 ms > 13/05/31 23:19:17 INFO cluster.TaskSetManager: Lost TID 11412 (task 11.0:412) > 13/05/31 23:19:17 INFO cluster.TaskSetManager: Loss was due to > java.io.EOFException > java.io.EOFException > at > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2322) > at > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2791) > at > java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798) > at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298) > at > spark.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:18) > at spark.JavaDeserializationStream.<init>(JavaSerializer.scala:18) > at > spark.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:53) > at spark.storage.BlockManager.dataDeserialize(BlockManager.scala:925) > at > spark.storage.BlockFetcherIterator$NettyBlockFetcherIterator$$anonfun$5.apply(BlockFetcherIterator.scala:279) > at > spark.storage.BlockFetcherIterator$NettyBlockFetcherIterator$$anonfun$5.apply(BlockFetcherIterator.scala:279) > at > spark.storage.BlockFetcherIterator$NettyBlockFetcherIterator.next(BlockFetcherIterator.scala:318) > at > spark.storage.BlockFetcherIterator$NettyBlockFetcherIterator.next(BlockFetcherIterator.scala:239) > at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440) > at spark.util.CompletionIterator.hasNext(CompletionIterator.scala:9) > at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:457) > at scala.collection.Iterator$class.foreach(Iterator.scala:772) > at scala.collection.Iterator$$anon$22.foreach(Iterator.scala:451) > at spark.Aggregator.combineCombinersByKey(Aggregator.scala:33) > at > spark.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:72) > at > spark.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:72) > at spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:19) > at spark.RDD.computeOrReadCheckpoint(RDD.scala:220) > at spark.RDD.iterator(RDD.scala:209) > at spark.scheduler.ResultTask.run(ResultTask.scala:84) > at spark.executor.Executor$TaskRunner.run(Executor.scala:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:679) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org