[
https://issues.apache.org/jira/browse/SPARK-32639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chen Zhang updated SPARK-32639:
-------------------------------
Attachment: 000.snappy.parquet
> Support GroupType parquet mapkey field
> --------------------------------------
>
> Key: SPARK-32639
> URL: https://issues.apache.org/jira/browse/SPARK-32639
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.6, 3.0.0
> Reporter: Chen Zhang
> Priority: Major
> Attachments: 000.snappy.parquet
>
>
> I have a parquet file, and the MessageType recorded in the file is:
> {code:java}
> message parquet_schema {
> optional group value (MAP) {
> repeated group key_value {
> required group key {
> optional binary first (UTF8);
> optional binary middle (UTF8);
> optional binary last (UTF8);
> }
> optional binary value (UTF8);
> }
> }
> }{code}
>
> Use +spark.read.parquet("000.snappy.parquet")+ to read the file. Spark will
> throw an exception when converting Parquet MessageType to Spark SQL
> StructType:
> {code:java}
> AssertionError(Map key type is expected to be a primitive type, but found...)
> {code}
>
> Use +spark.read.schema("value MAP<STRUCT<first:STRING, middle:STRING,
> last:STRING>, STRING>").parquet("000.snappy.parquet")+ to read the file,
> spark returns the correct result .
> According to the parquet project document
> (https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps),
> the mapKey in the parquet format does not need to be a primitive type.
>
> Note: This parquet file is not written by spark, because spark will write
> additional sparkSchema string information in the parquet file. When Spark
> reads, it will directly use the additional sparkSchema information in the
> file instead of converting Parquet MessageType to Spark SQL StructType.
> I will submit a PR later.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]