[
https://issues.apache.org/jira/browse/SPARK-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059570#comment-14059570
]
Yin Huai commented on SPARK-1649:
---------------------------------
My PR for SPARK-2179 (https://github.com/apache/spark/pull/1346) introduces the
"containsNull" field to the ArrayType. For Parquet, we still do not support
null values insides a Parquet array.
For the key and value of MapType, [~marmbrus] and I discussed about it. We
think it is not semantically clear what a null means when it appears in the key
or value field (considering a null is used to indicate a missing data value).
So, we decided that key and value in a MapType should not contain any null
value and we will not introduce containsNull to MapType
> Figure out Nullability semantics for Array elements and Map values
> ------------------------------------------------------------------
>
> Key: SPARK-1649
> URL: https://issues.apache.org/jira/browse/SPARK-1649
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.1.0
> Reporter: Andre Schumacher
> Priority: Critical
>
> For the underlying storage layer it would simplify things such as schema
> conversions, predicate filter determination and such to record in the data
> type itself whether a column can be nullable. So the DataType type could look
> like like this:
> abstract class DataType(nullable: Boolean = true)
> Concrete subclasses could then override the nullable val. Mostly this could
> be left as the default but when types can be contained in nested types one
> could optimize for, e.g., arrays with elements that are nullable and those
> that are not.
--
This message was sent by Atlassian JIRA
(v6.2#6252)