[
https://issues.apache.org/jira/browse/SPARK-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076739#comment-14076739
]
Yin Huai commented on SPARK-1649:
---------------------------------
Seems Hive supports null values in a Map, to be consistent with Hive, we will
also support that. I will introduce a boolean "valuesContainNull" to MapType.
For null map keys, Hive has inconsistent behaviors. Here are examples (using
"sbt/sbt hive/console").
{code}
runSqlHive("select map(null, 1, null, 2, null, 3, 4, null, 5, null) from src
limit 1")
res6: Seq[String] = Buffer({4:null,5:null})
runSqlHive("select map_keys(map(null, 1, null, 2, null, 3, 4, null, 5, null))
from src limit 1")
res7: Seq[String] = Buffer([null,4,5])
runSqlHive("select map_values(map(null, 1, null, 2, null, 3, 4, null, 5, null))
from src limit 1")
res8: Seq[String] = Buffer([3,null,null])
{code}
Also, different implementations handle null keys in different ways (e.g.
HashMap supports an entry with a null key. But, TreeMap will throw a NPE when a
user want to insert an entry with a null key). So, I think we will not allow
null keys in a map.
> Figure out Nullability semantics for Array elements and Map values
> ------------------------------------------------------------------
>
> Key: SPARK-1649
> URL: https://issues.apache.org/jira/browse/SPARK-1649
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.1.0
> Reporter: Andre Schumacher
> Priority: Critical
>
> For the underlying storage layer it would simplify things such as schema
> conversions, predicate filter determination and such to record in the data
> type itself whether a column can be nullable. So the DataType type could look
> like like this:
> abstract class DataType(nullable: Boolean = true)
> Concrete subclasses could then override the nullable val. Mostly this could
> be left as the default but when types can be contained in nested types one
> could optimize for, e.g., arrays with elements that are nullable and those
> that are not.
--
This message was sent by Atlassian JIRA
(v6.2#6252)