[ https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240452#comment-16240452 ]
Paul Balm edited comment on TOREE-428 at 11/6/17 4:13 PM: ---------------------------------------------------------- I confirm this issue. This is my test case (slightly simpler): * Create the test data from the terminal: {{F=arraystore.csv ; echo a > $F; echo b >> $F; echo c >> $F}} * Read the file into an RDD with a case class: {noformat} case class IdClass(id: String) sc.textFile("arraystore.csv").map(IdClass).collect() {noformat} This produces the stacktrace in the description. An {{ArrayStoreException}} is normally an indication that an object is being stored in an array with an incompatible type. For example when you have an array Strings and you're trying to put an {{IdClass}} or {{Person}} object into it. As a wild guess as to what might be going on here: If you define IdClass on one thread with a given ClassLoader, and you reload it using another ClassLoader, it's not considered the same class. So if you create IdClass objects on one thread and you create the Array[IdClass] on another thread which has a different ClassLoader, you would get an ArrayStoreException when putting the objects into the array. In a normal application that doesn't happen because there are no classes defined at run-time, they are all loaded by the SystemClassLoader from the JARs on the class path. However a class defined at runtime cannot be loaded by the SystemClassLoader. You have to set up a custom ClassLoader and make sure you consistently use this one across all threads that will be using the newly defined class. was (Author: pbalm): I confirm this issue. This is my test case (slightly simpler): * Create the test data from the terminal: {{F=arraystore.csv ; echo a > $F; echo b >> $F; echo c >> $F}} * Read the file into an RDD with a case class: {noformat} case class IdClass(id: String) sc.textFile("arraystore.csv").map(IdClass).collect() {noformat} This produces the stacktrace in the description. An {{ArrayStoreException}} is normally an indication that an object is being stored in an array with an incompatible type. For example when you have an array Strings and you're trying to put an {{IdClass}} or {{Person}} object into it. As a wild guess as to what might be going on here: If you define IdClass on one thread with a given ClassLoader, and you reload it using another ClassLoader, it's not considered the same class. So if you create IdClass objects on one thread and you create the Array[IdClass] on another thread which has a different ClassLoader, you would get an ArrayStoreException when putting the objects into the array. > Can't use case class in the Scala notebook > ------------------------------------------ > > Key: TOREE-428 > URL: https://issues.apache.org/jira/browse/TOREE-428 > Project: TOREE > Issue Type: Bug > Components: Build > Reporter: Haifeng Li > > the version of docker: > jupyter/all-spark-notebook:lastest > the way to start docker: > docker run -it --rm -p 8888:8888 jupyter/all-spark-notebook:latest > or > docker ps -a > docker start -i containerID > the steps: > Visit http://localhost:8888 > Start an toree notebook > input code above > {code:java} > import spark.implicits._ > val p = spark.sparkContext.textFile ("../Data/person.txt") > val pmap = p.map ( _.split (",")) > pmap.collect() > {code} > the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), > Array(George, Bush, 68), Array(Bill, Clinton, 68)) > {code:java} > case class Persons (first_name:String,last_name: String,age:Int) > val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt)) > personRDD.take(1) > {code} > the error message: > {code:java} > org.apache.spark.SparkDriverExecutionException: Execution error > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.take(RDD.scala:1327) > ... 39 elided > Caused by: java.lang.ArrayStoreException: [LPersons; > at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > The above code is working with the spark-shell. From error message, I > speculated that the driver program didn't correctly handle case class Persons > to RDD partition. -- This message was sent by Atlassian JIRA (v6.4.14#64029)