[ 
https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104556#comment-14104556
 ] 

Michael Armbrust commented on SPARK-3036:
-----------------------------------------

Can you explain more about what you mean when you say we are breaking backwards 
compatibility?  It seems like newer version of Spark SQL should always be able 
to read data written by older version as long as we support both versions.  
Choosing between them when writing based on valueContainsNull seems like the 
best solution.

I think it is okay (though undesirable) for older versions of Spark SQL to be 
unable to read from data written by newer versions, as this is unavoidable as 
we add features.

> Add MapType containing null value support to Parquet.
> -----------------------------------------------------
>
>                 Key: SPARK-3036
>                 URL: https://issues.apache.org/jira/browse/SPARK-3036
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Takuya Ueshin
>            Priority: Blocker
>
> Current Parquet schema for {{MapType}} is as follows regardless of 
> {{valueContainsNull}}:
> {noformat}
> message root {
>   optional group a (MAP) {
>     repeated group map (MAP_KEY_VALUE) {
>       required int32 key;
>       required int32 value;
>     }
>   }
> }
> {noformat}
> and if the map contains {{null}} value, it throws runtime exception.
> To handle {{MapType}} containing {{null}} value, the schema should be as 
> follows if {{valueContainsNull}} is {{true}}:
> {noformat}
> message root {
>   optional group a (MAP) {
>     repeated group map (MAP_KEY_VALUE) {
>       required int32 key;
>       optional int32 value;
>     }
>   }
> }
> {noformat}
> FYI:
> Hive's Parquet writer *always* uses the latter schema, but reader can read 
> from both schema.
> NOTICE:
> This change will break backward compatibility when the schema is read from 
> Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to