[
https://issues.apache.org/jira/browse/SPARK-28702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905802#comment-16905802
]
Shivu Sondur commented on SPARK-28702:
--------------------------------------
i will check this issue
> Display useful error message (instead of NPE) for invalid Dataset operations
> (e.g. calling actions inside of transformations)
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-28702
> URL: https://issues.apache.org/jira/browse/SPARK-28702
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Josh Rosen
> Priority: Major
>
> In Spark, SparkContext and SparkSession can only be used on the driver, not
> on executors. For example, this means that you cannot call
> {{someDataset.collect()}} inside of a Dataset or RDD transformation.
> When Spark serializes RDDs and Datasets, references to SparkContext and
> SparkSession are null'ed out (by being marked as {{@transient}} or via the
> Closure Cleaner). As a result, RDD and Dataset methods which reference use
> these driver-side-only objects (e.g. actions or transformations) will see
> {{null}} references and may fail with a {{NullPointerException}}. For
> example, in code which (via a chain of calls) tried to {{collect()}} a
> dataset inside of a Dataset.map operation:
> {code:java}Caused by: java.lang.NullPointerException
> at
> <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution$lzycompute(Dataset.scala:3027)
> at
> <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution(Dataset.scala:3025)
> at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
> at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)
> [...] {code}
> The resulting NPE can be _very_ confusing to users.
> In SPARK-5063 I added some logic to throw clearer error messages when
> performing similar invalid actions on RDDs. This ticket's scope is to
> implement similar logic for Datasets.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]