[ 
https://issues.apache.org/jira/browse/SPARK-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059570#comment-14059570
 ] 

Yin Huai commented on SPARK-1649:
---------------------------------

My PR for SPARK-2179 (https://github.com/apache/spark/pull/1346) introduces the 
"containsNull" field to the ArrayType. For Parquet, we still do not support 
null values insides a Parquet array.

For the key and value of MapType, [~marmbrus] and I discussed about it. We 
think it is not semantically clear what a null means when it appears in the key 
or value field (considering a null is used to indicate a missing data value). 
So, we decided that key and value in a MapType should not contain any null 
value and we will not introduce containsNull to MapType

> Figure out Nullability semantics for Array elements and Map values
> ------------------------------------------------------------------
>
>                 Key: SPARK-1649
>                 URL: https://issues.apache.org/jira/browse/SPARK-1649
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Andre Schumacher
>            Priority: Critical
>
> For the underlying storage layer it would simplify things such as schema 
> conversions, predicate filter determination and such to record in the data 
> type itself whether a column can be nullable. So the DataType type could look 
> like like this:
> abstract class DataType(nullable: Boolean = true)
> Concrete subclasses could then override the nullable val. Mostly this could 
> be left as the default but when types can be contained in nested types one 
> could optimize for, e.g., arrays with elements that are nullable and those 
> that are not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to