Prasanna Saraswathi Krishnan created SPARK-29432: ----------------------------------------------------
Summary: nullable flag of new column changes when persisting a pyspark dataframe Key: SPARK-29432 URL: https://issues.apache.org/jira/browse/SPARK-29432 Project: Spark Issue Type: Question Components: SQL Affects Versions: 2.4.0 Environment: Spark 2.4.0-cdh6.1.1 (Cloudera distribution) Python 3.7.3 Reporter: Prasanna Saraswathi Krishnan When I add a new column to a dataframe with {{withColumn}} function, by default, the column is added with {{nullable=false}}. But, when I save the dataframe, the flag changes to {{nullable=true}}. Is this the expected behavior? why? {{>>> l = [('Alice', 1)]}} {{>>> df = spark.createDataFrame(l)}} {{>>> df.printSchema()}} {{root}} {{ |-- _1: string (nullable = true)}} {{ |-- _2: long (nullable = true)}} {{>>> from pyspark.sql.functions import lit}} {{>>> df = df.withColumn('newCol', lit('newVal'))}} {{>>> df.printSchema()}} {{root}} {{ |-- _1: string (nullable = true)}} {{ |-- _2: long (nullable = true)}} {{ |-- newCol: string (nullable = false)}} {{>>> df.write.saveAsTable('default.withcolTest', mode='overwrite')}} {{>>> spark.sql("select * from default.withcolTest").printSchema()}} {{root}} {{ |-- _1: string (nullable = true)}} {{ |-- _2: long (nullable = true)}} {{ |-- newCol: string (nullable = true)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org