[ 
https://issues.apache.org/jira/browse/SPARK-11319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014098#comment-15014098
 ] 

Harry Brundage commented on SPARK-11319:
----------------------------------------

Forgive my frankness but that is ridiculous. This means anyone reading a JSON 
or CSV file needs to do their own validation pass over the data before passing 
it to SparkSQL. For anyone working with any kind of datasource they don't trust 
completely, this renders the entire loader layer of SparkSQL useless, and 
forces each user to implement their own. You have an opportunity to solve your 
users problems of notifying them when their schema expectations are violated 
and instead you silently allow and happily encourage data quality issues. Every 
database worth its salt validates data on input!

> PySpark silently Accepts null values in non-nullable DataFrame fields.
> ----------------------------------------------------------------------
>
>                 Key: SPARK-11319
>                 URL: https://issues.apache.org/jira/browse/SPARK-11319
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>            Reporter: Kevin Cox
>
> Running the following code with a null value in a non-nullable column 
> silently works. This makes the code incredibly hard to trust.
> {code}
> In [2]: from pyspark.sql.types import *
> In [3]: sqlContext.createDataFrame([(None,)], StructType([StructField("a", 
> TimestampType(), False)])).collect()
> Out[3]: [Row(a=None)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to