[
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008326#comment-14008326
]
sam commented on SPARK-1867:
----------------------------
[~srowen] Thanks for you response, I believe I understand the situation a
little better now.
// you say your app uses HBase but I see no dependence on the client libraries.
//
Yes I removed everything I think might be related to the exception to simplify
the problem.
// There shouldn't be any trial an error to it, if you express as dependencies
all the things you directly use.//
How am I supposed to know which artefact contains the packages I use? Only
idea I have is a bash script to wget all the cloudera artefacts, then grep the
source code for the package. To make things worse, between every version of
CHD they move things around, e.g. to get our hbase code working, we used to
just need "hbase" % "0.94.15-cdh4.6.0", but now we are on CDH5.0.0 we need half
a dozen.
// This is nothing to do with Spark per se. //
Yes and no. In order for Spark to work, one needs to match up all the
dependencies in the Hadoop eco system to the app jar, and spark jars correctly,
that is currently very painful.
[~michaelmalak] Please gimmi an upvote :)
http://stackoverflow.com/questions/23710032/apache-spark-throws-java-lang-illegalstateexception-unread-block-data
> Spark Documentation Error causes java.lang.IllegalStateException: unread
> block data
> -----------------------------------------------------------------------------------
>
> Key: SPARK-1867
> URL: https://issues.apache.org/jira/browse/SPARK-1867
> Project: Spark
> Issue Type: Bug
> Reporter: sam
>
> I've employed two System Administrators on a contract basis (for quite a bit
> of money), and both contractors have independently hit the following
> exception. What we are doing is:
> 1. Installing Spark 0.9.1 according to the documentation on the website,
> along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
> 2. Building a fat jar with a Spark app with sbt then trying to run it on the
> cluster
> I've also included code snippets, and sbt deps at the bottom.
> When I've Googled this, there seems to be two somewhat vague responses:
> a) Mismatching spark versions on nodes/user code
> b) Need to add more jars to the SparkConf
> Now I know that (b) is not the problem having successfully run the same code
> on other clusters while only including one jar (it's a fat jar).
> But I have no idea how to check for (a) - it appears Spark doesn't have any
> version checks or anything - it would be nice if it checked versions and
> threw a "mismatching version exception: you have user code using version X
> and node Y has version Z".
> I would be very grateful for advice on this.
> The exception:
> Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task
> 0.0:1 failed 32 times (most recent failure: Exception failure:
> java.lang.IllegalStateException: unread block data)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.IllegalStateException: unread block data [duplicate 59]
> My code snippet:
> val conf = new SparkConf()
> .setMaster(clusterMaster)
> .setAppName(appName)
> .setSparkHome(sparkHome)
> .setJars(SparkContext.jarOfClass(this.getClass))
> println("count = " + new SparkContext(conf).textFile(someHdfsPath).count())
> My SBT dependencies:
> // relevant
> "org.apache.spark" % "spark-core_2.10" % "0.9.1",
> "org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
> // standard, probably unrelated
> "com.github.seratch" %% "awscala" % "[0.2,)",
> "org.scalacheck" %% "scalacheck" % "1.10.1" % "test",
> "org.specs2" %% "specs2" % "1.14" % "test",
> "org.scala-lang" % "scala-reflect" % "2.10.3",
> "org.scalaz" %% "scalaz-core" % "7.0.5",
> "net.minidev" % "json-smart" % "1.2"
--
This message was sent by Atlassian JIRA
(v6.2#6252)