[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401822#comment-16401822 ]
ASF GitHub Bot commented on PARQUET-968: ---------------------------------------- BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in ProtoParquet URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-373654480 @costimuraru I did a PR in your repo to add a flag that enable writing using the specs-compliant schemas: https://github.com/costimuraru/parquet-mr/pull/2 . The flag is off by default in order to keep backward compatibility. I believe this is the safest way going forward and will allow to merge this PR without breaking backward compatibility. The flag could be set to true in a future major release of parquet. You can find a test suite at https://github.com/BenoitHanotte/parquet-968 that highlights the differences in how the old schemas and the new ones (specs-compliant) are interpreted by Spark and validates the changes in this PR: - default behavior (with flag set to false, default behavior, backward compatible): ``` +-------------+----------------+--------+----------------+ |emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap| +-------------+----------------+--------+----------------+ | []| [1, 1]| []|[[1, 1], [2, 2]]| +-------------+----------------+--------+----------------+ ``` - specs-compliant schemas: ``` +-------------+----------------+--------+----------------+ |emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap| +-------------+----------------+--------+----------------+ | null| [1, 1]| null|[1 -> 1, 2 -> 2]| +-------------+----------------+--------+----------------+ ``` You can see that maps are interpreted differently (they are interpreted as lists of key-value tuples with the non-compliant schemas) as are the default values (empty lists and maps). @lukasnalezenec @qinghui-xu feel free to also have a look as we would like to have this PR merged if all the concerns are addressed Thanks! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Hive/Presto support in ProtoParquet > --------------------------------------- > > Key: PARQUET-968 > URL: https://issues.apache.org/jira/browse/PARQUET-968 > Project: Parquet > Issue Type: Task > Reporter: Constantin Muraru > Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)