[ 
https://issues.apache.org/jira/browse/SPARK-54302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039281#comment-18039281
 ] 

Vindhya G commented on SPARK-54302:
-----------------------------------

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/classic/Dataset.scala#L945]
 . If I understand correctly, filter does not change the schema from the parent 
dataframe regardless of the predicate used for the filter.  What are the 
consequences/scenarios where having nullable=true can impact?

> Filtering by isNotNull should return DataFrame with nullable=False
> ------------------------------------------------------------------
>
>                 Key: SPARK-54302
>                 URL: https://issues.apache.org/jira/browse/SPARK-54302
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.5.7
>            Reporter: Maxim Martynov
>            Priority: Major
>
> I have DataFrame with schema like this:
> {code:python}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.getOrCreate()
> df = spark.createDataFrame([{"a": 1},{"a": None}], schema="a:int")
> df.printSchema()
> """
> root
>  |-- a: integer (nullable = true)
> """
> df.where(df.a.isNotNull()).printSchema()
> """
> root
>  |-- a: integer (nullable = true)
> """
> {code}
> Currently filters applied to dataframe doesn't change it's schema. To make 
> colum non-nullable I have to use coalesce:
> {code:python}
> import pyspark.sql.functions as F
> df.where(df.a.isNotNull()).select(F.coalesce(df.a, F.lit(0))).printSchema()
> """
> root
>  |-- coalesce(a, 0): integer (nullable = false)
> """
> {code}
> But I have to choose {{F.lit(...)}} value based on column type, even if it 
> will never be used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to