gszadovszky opened a new issue, #468: URL: https://github.com/apache/parquet-format/issues/468
### Describe the enhancement requested There are a couple of issues with the specification of the logical type [MAP](https://github.com/apache/parquet-format/blob/apache-parquet-format-2.10.0/LogicalTypes.md#maps): * typo: "(...) `optional` or `required` and determines whether the **list** is nullable." * Based on the spec we allow to have a nested `key` column. Does it make sense? Most engines/libs require to use primitives for keys. * It is clear that `value` is not required, however I did not find a proper implementation to handle this. Do we want to suggest anything for this case (e.g. value column to be null)? * The [Backward-compatibility rules](https://github.com/apache/parquet-format/blob/apache-parquet-format-2.10.0/LogicalTypes.md#backward-compatibility-rules-1) suggests that `key` and `value` might not be named according to the spec. But does not say anything about how to identify them. I've seen multiple implementations (e.g. [the avro binding of parquet-java](https://github.com/apache/parquet-java/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L446)), where we simply choose the `0`th element as key and the `1`st one as value without actually checking their names. It does not seem to be correct according on the spec. * Spec mentions that `MAP_KEY_VALUE` might appear at the place of `MAP` but doesn't mention its original purpose to tag `key_value` level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
