[ https://issues.apache.org/jira/browse/SPARK-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108871#comment-15108871 ]
Thomas Graves commented on SPARK-12930: --------------------------------------- Note that change the query to remove the ['pos'] from info['pos'] int he select part of the command works around the issue. Stack trace from exception: java.lang.NullPointerException Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1296) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1284) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1283) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1509) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1471) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) > NullPointerException running hive query with array dereference in select and > where clause > ----------------------------------------------------------------------------------------- > > Key: SPARK-12930 > URL: https://issues.apache.org/jira/browse/SPARK-12930 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2 > Reporter: Thomas Graves > > I had a user doing a hive query from spark where they had a array dereference > in the select clause and in the where clause, it gave the user a > NullPointerException when the where clause should have filtered it out. Its > like spark is evaluating the select part before running the where clause. > The info['pos'] below is what caused the issue: > Query looked like: > SELECT foo, > info['pos'] AS pos > FROM db.table > WHERE date >= '$initialDate' AND > date <= '$finalDate' AND > info is not null AND > info['pos'] is not null > LIMIT 10 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org