[
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401667#comment-16401667
]
ASF GitHub Bot commented on PARQUET-968:
----------------------------------------
BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-373654480
@costimuraru I did a PR in your repo to add a flag that enablse writing
using the specs-compliant schemas:
https://github.com/costimuraru/parquet-mr/pull/2 . The flag is off by default
in order to keep backward compatibility.
I believe this is the safest way going forward and will allow to merge this
PR without breaking backward compatibility. the flag could be set to true in a
future major release of parquet.
You can find a test suite at https://github.com/BenoitHanotte/parquet-968
that highlights the differences in how the old schemas and the new ones
(specs-compliant) are interpreted by Spark and validates the changes in this PR:
- default behavior (with flag set to false, default behavior, backward
compatible):
```
+-------------+----------------+--------+----------------+
|emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
+-------------+----------------+--------+----------------+
| []| [1, 1]| []|[[1, 1], [2, 2]]|
+-------------+----------------+--------+----------------+
```
- specs-compliant schemas:
```
+-------------+----------------+--------+----------------+
|emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
+-------------+----------------+--------+----------------+
| null| [1, 1]| null|[1 -> 1, 2 -> 2]|
+-------------+----------------+--------+----------------+
```
You can see that maps are interpreted differently (they are interpreted as
lists of key-value tuples with the non-compliant schemas) as are the default
values (empty lists and maps).
@lukasnalezenec @qinghui-xu feel free to also have a look as we would like
to have this PR merged if all the concerns are addressed
Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add Hive/Presto support in ProtoParquet
> ---------------------------------------
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
> Issue Type: Task
> Reporter: Constantin Muraru
> Priority: Major
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)