[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307719#comment-14307719
 ] 

Philippe Girolami commented on SPARK-1867:
------------------------------------------

[~srowen] I'm unfortunately reporting this bug. To mitigate SPARK-5557, I've 
reverted my working branch to commit cd5da42 until it gets sorted out. I should 
have included the stack trace. I think someone could easily verify by doing a 
clean clone, checkout cd5da42 and build the way I describe. Then it's simply a 
matter of launch spark-shell. If that works for you, then I agree it's on my 
side but I can't imagine how it could be given the steps I describe to 
reproduce it.

{code}
Philippes-MacBook-Air-3:spark Philippe$ bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/05 19:13:15 INFO SecurityManager: Changing view acls to: Philippe
15/02/05 19:13:15 INFO SecurityManager: Changing modify acls to: Philippe
15/02/05 19:13:15 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(Philippe); users 
with modify permissions: Set(Philippe)
15/02/05 19:13:15 INFO HttpServer: Starting HTTP Server
15/02/05 19:13:16 INFO Utils: Successfully started service 'HTTP class server' 
on port 61040.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.0-SNAPSHOT
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/05 19:13:21 INFO SparkContext: Running Spark version 1.3.0-SNAPSHOT
15/02/05 19:13:21 INFO SecurityManager: Changing view acls to: Philippe
15/02/05 19:13:21 INFO SecurityManager: Changing modify acls to: Philippe
15/02/05 19:13:21 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(Philippe); users 
with modify permissions: Set(Philippe)
15/02/05 19:13:22 INFO Slf4jLogger: Slf4jLogger started
15/02/05 19:13:22 INFO Remoting: Starting remoting
15/02/05 19:13:22 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://[email protected]:61043]
15/02/05 19:13:22 INFO Utils: Successfully started service 'sparkDriver' on 
port 61043.
15/02/05 19:13:22 INFO SparkEnv: Registering MapOutputTracker
15/02/05 19:13:22 INFO SparkEnv: Registering BlockManagerMaster
15/02/05 19:13:22 INFO DiskBlockManager: Created local directory at 
/var/folders/8r/0ty24ys52kvdvx8r6nz2cdc00000gn/T/spark-local-20150205191322-7e22
15/02/05 19:13:22 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/05 19:13:22 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
15/02/05 19:13:23 INFO HttpFileServer: HTTP File server directory is 
/var/folders/8r/0ty24ys52kvdvx8r6nz2cdc00000gn/T/spark-8400830a-a7fc-4909-ae37-ee4b48e3ff88
15/02/05 19:13:23 INFO HttpServer: Starting HTTP Server
15/02/05 19:13:23 INFO Utils: Successfully started service 'HTTP file server' 
on port 61044.
15/02/05 19:13:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
Attempting port 4041.
15/02/05 19:13:23 INFO Utils: Successfully started service 'SparkUI' on port 
4041.
15/02/05 19:13:23 INFO SparkUI: Started SparkUI at http://192.168.1.31:4041
15/02/05 19:13:23 INFO Executor: Using REPL class URI: http://192.168.1.31:61040
15/02/05 19:13:23 INFO AkkaUtils: Connecting to HeartbeatReceiver: 
akka.tcp://[email protected]:61043/user/HeartbeatReceiver
15/02/05 19:13:23 INFO NettyBlockTransferService: Server created on 61046
15/02/05 19:13:23 INFO BlockManagerMaster: Trying to register BlockManager
15/02/05 19:13:23 INFO BlockManagerMasterActor: Registering block manager 
localhost:61046 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 61046)
15/02/05 19:13:23 INFO BlockManagerMaster: Registered BlockManager
15/02/05 19:13:23 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> val source = sc.textFile("/tmp/test")
15/02/05 19:13:27 INFO MemoryStore: ensureFreeSpace(163705) called with 
curMem=0, maxMem=278302556
15/02/05 19:13:27 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 159.9 KB, free 265.3 MB)
15/02/05 19:13:27 INFO MemoryStore: ensureFreeSpace(22736) called with 
curMem=163705, maxMem=278302556
15/02/05 19:13:27 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 22.2 KB, free 265.2 MB)
15/02/05 19:13:27 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:61046 (size: 22.2 KB, free: 265.4 MB)
15/02/05 19:13:27 INFO BlockManagerMaster: Updated info of block 
broadcast_0_piece0
15/02/05 19:13:27 INFO SparkContext: Created broadcast 0 from textFile at 
<console>:12
source: org.apache.spark.rdd.RDD[String] = /tmp/test MapPartitionsRDD[1] at 
textFile at <console>:12

scala> source.saveAsTextFile("/tmp/test_spark_outputA")
15/02/05 19:13:32 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
15/02/05 19:13:32 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
15/02/05 19:13:32 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
15/02/05 19:13:32 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
15/02/05 19:13:32 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id
15/02/05 19:13:32 INFO FileInputFormat: Total input paths to process : 1
15/02/05 19:13:32 INFO SparkContext: Starting job: saveAsTextFile at 
<console>:15
15/02/05 19:13:32 INFO DAGScheduler: Got job 0 (saveAsTextFile at <console>:15) 
with 2 output partitions (allowLocal=false)
15/02/05 19:13:32 INFO DAGScheduler: Final stage: Stage 0(saveAsTextFile at 
<console>:15)
15/02/05 19:13:32 INFO DAGScheduler: Parents of final stage: List()
15/02/05 19:13:32 INFO DAGScheduler: Missing parents: List()
15/02/05 19:13:32 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at 
saveAsTextFile at <console>:15), which has no missing parents
15/02/05 19:13:32 INFO MemoryStore: ensureFreeSpace(112600) called with 
curMem=186441, maxMem=278302556
15/02/05 19:13:32 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 110.0 KB, free 265.1 MB)
15/02/05 19:13:32 INFO MemoryStore: ensureFreeSpace(67269) called with 
curMem=299041, maxMem=278302556
15/02/05 19:13:32 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 65.7 KB, free 265.1 MB)
15/02/05 19:13:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 
localhost:61046 (size: 65.7 KB, free: 265.3 MB)
15/02/05 19:13:32 INFO BlockManagerMaster: Updated info of block 
broadcast_1_piece0
15/02/05 19:13:32 INFO SparkContext: Created broadcast 1 from broadcast at 
DAGScheduler.scala:842
15/02/05 19:13:32 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(MapPartitionsRDD[2] at saveAsTextFile at <console>:15)
15/02/05 19:13:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/02/05 19:13:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, PROCESS_LOCAL, 1278 bytes)
15/02/05 19:13:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
localhost, PROCESS_LOCAL, 1278 bytes)
15/02/05 19:13:32 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/02/05 19:13:32 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/02/05 19:13:32 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalStateException: unread block data
        at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
        at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

{code}

> Spark Documentation Error causes java.lang.IllegalStateException: unread 
> block data
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-1867
>                 URL: https://issues.apache.org/jira/browse/SPARK-1867
>             Project: Spark
>          Issue Type: Bug
>            Reporter: sam
>
> I've employed two System Administrators on a contract basis (for quite a bit 
> of money), and both contractors have independently hit the following 
> exception.  What we are doing is:
> 1. Installing Spark 0.9.1 according to the documentation on the website, 
> along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
> 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
> cluster
> I've also included code snippets, and sbt deps at the bottom.
> When I've Googled this, there seems to be two somewhat vague responses:
> a) Mismatching spark versions on nodes/user code
> b) Need to add more jars to the SparkConf
> Now I know that (b) is not the problem having successfully run the same code 
> on other clusters while only including one jar (it's a fat jar).
> But I have no idea how to check for (a) - it appears Spark doesn't have any 
> version checks or anything - it would be nice if it checked versions and 
> threw a "mismatching version exception: you have user code using version X 
> and node Y has version Z".
> I would be very grateful for advice on this.
> The exception:
> Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task 
> 0.0:1 failed 32 times (most recent failure: Exception failure: 
> java.lang.IllegalStateException: unread block data)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>       at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>       at scala.Option.foreach(Option.scala:236)
>       at 
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>       at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
> java.lang.IllegalStateException: unread block data [duplicate 59]
> My code snippet:
> val conf = new SparkConf()
>                .setMaster(clusterMaster)
>                .setAppName(appName)
>                .setSparkHome(sparkHome)
>                .setJars(SparkContext.jarOfClass(this.getClass))
> println("count = " + new SparkContext(conf).textFile(someHdfsPath).count())
> My SBT dependencies:
> // relevant
> "org.apache.spark" % "spark-core_2.10" % "0.9.1",
> "org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
> // standard, probably unrelated
> "com.github.seratch" %% "awscala" % "[0.2,)",
> "org.scalacheck" %% "scalacheck" % "1.10.1" % "test",
> "org.specs2" %% "specs2" % "1.14" % "test",
> "org.scala-lang" % "scala-reflect" % "2.10.3",
> "org.scalaz" %% "scalaz-core" % "7.0.5",
> "net.minidev" % "json-smart" % "1.2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to