[ https://issues.apache.org/jira/browse/SPARK-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277217#comment-14277217 ]
Sean Owen commented on SPARK-5235: ---------------------------------- [~alexbaretta] It certainly may not be your code of course. I mean "people" including the Spark code. But surely the problem is solved exactly by not trying to serialize {{SQLContext}}, no? despite its declaration, as you've demonstrated, it does not serialize, and was not designed to be used after serialization given the {{@transient}} field. You've suggested a reasonable band-aid on a band-aid but I would either like to fix the cause or understand why it's actually supposed to act this way. Other Contexts in Spark are not supposed to be serialized. Where I've seen this pattern before in the unit tests, it was certainly a hack for convenience that didn't matter much because it was just a test. Can you run with {{-Dsun.io.serialization.extendeddebuginfo=true}}? this will show exactly what had the reference to {{SQLContext}}. > java.io.NotSerializableException: org.apache.spark.sql.SQLConf > -------------------------------------------------------------- > > Key: SPARK-5235 > URL: https://issues.apache.org/jira/browse/SPARK-5235 > Project: Spark > Issue Type: Bug > Reporter: Alex Baretta > > The SQLConf field in SQLContext is neither Serializable nor transient. Here's > the stack trace I get when running SQL queries against a Parquet file. > Exception in thread "Thread-43" org.apache.spark.SparkException: Job aborted > due to stage failure: Task not serializable: > java.io.NotSerializableException: org.apache.spark.sql.SQLConf > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1195) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1184) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1183) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1183) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:843) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:779) > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:763) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1356) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) > at akka.dispatch.Mailbox.run(Mailbox.scala:220) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org