[ 
https://issues.apache.org/jira/browse/SPARK-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076739#comment-14076739
 ] 

Yin Huai commented on SPARK-1649:
---------------------------------

Seems Hive supports null values in a Map, to be consistent with Hive, we will 
also support that. I will introduce a boolean "valuesContainNull" to MapType. 
For null map keys, Hive has inconsistent behaviors. Here are examples (using 
"sbt/sbt hive/console"). 
{code}
runSqlHive("select map(null, 1, null, 2, null, 3, 4, null, 5, null) from src 
limit 1")
res6: Seq[String] = Buffer({4:null,5:null})
runSqlHive("select map_keys(map(null, 1, null, 2, null, 3, 4, null, 5, null)) 
from src limit 1")
res7: Seq[String] = Buffer([null,4,5])
runSqlHive("select map_values(map(null, 1, null, 2, null, 3, 4, null, 5, null)) 
from src limit 1")
res8: Seq[String] = Buffer([3,null,null])
{code}
Also, different implementations handle null keys in different ways (e.g. 
HashMap supports an entry with a null key. But, TreeMap will throw a NPE when a 
user want to insert an entry with a null key). So, I think we will not allow 
null keys in a map.

> Figure out Nullability semantics for Array elements and Map values
> ------------------------------------------------------------------
>
>                 Key: SPARK-1649
>                 URL: https://issues.apache.org/jira/browse/SPARK-1649
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Andre Schumacher
>            Priority: Critical
>
> For the underlying storage layer it would simplify things such as schema 
> conversions, predicate filter determination and such to record in the data 
> type itself whether a column can be nullable. So the DataType type could look 
> like like this:
> abstract class DataType(nullable: Boolean = true)
> Concrete subclasses could then override the nullable val. Mostly this could 
> be left as the default but when types can be contained in nested types one 
> could optimize for, e.g., arrays with elements that are nullable and those 
> that are not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to