[ https://issues.apache.org/jira/browse/SPARK-26770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin updated SPARK-26770: ----------------------------------- Component/s: (was: Spark Core) SQL > Misleading/unhelpful error message when wrapping a null in an Option > -------------------------------------------------------------------- > > Key: SPARK-26770 > URL: https://issues.apache.org/jira/browse/SPARK-26770 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.2 > Reporter: sam > Priority: Major > > This > {code} > // Using options to indicate nullable fields > case class Product(productID: Option[Int], > productName: Option[String]) > val productExtract: Dataset[Product] = > spark.createDataset(Seq( > Product( > productID = Some(6050286), > // user mistake here, should be `None` not `Some(null)` > productName = Some(null) > ))) > productExtract.count() > {code} > will give an error like the one below. This error is thrown from quite deep > down, but there should be some handling logic further up to check for nulls > and to give a more informative error message. E.g. it could tell the user > which field is null, it could detect the `Some(null)` error and suggest using > `None`. > Whatever the exception it shouldn't be NPE, since this is clearly a user > error, so should be some kind of user error exception. > {code} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in > stage 1.0 (TID 276, 10.139.64.8, executor 1): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:112) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > I've seen quite a few other people with this error, but I don't think it's > for the same reason: > https://docs.databricks.com/spark/latest/data-sources/tips/redshift-npe.html > https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/Dt6ilC9Dn54 > https://issues.apache.org/jira/browse/SPARK-17195 > https://issues.apache.org/jira/browse/SPARK-18859 > https://github.com/datastax/spark-cassandra-connector/issues/1062 > https://stackoverflow.com/questions/39875711/spark-sql-2-0-nullpointerexception-with-a-valid-postgresql-query -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org