[
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407735#comment-16407735
]
ASF GitHub Bot commented on PARQUET-968:
----------------------------------------
BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-374895431
@costimuraru I updated my PR in order to isolate the repetition-level change
in a separate commit.
However with lists as required we are likely creating incorrect files: the
root wrapper is `required` but not written as protobuf's `pb.getAllFields()`
only returns fields that are set (i.e. not an empty list), thus the field for
the list root wrapper is never written despite being `required`.
Here is how Spark would show the resulting dataset if lists are required:
```
+---------+------+-------------+----------------+--------+----------------+
|intNotSet|intSet|emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
+---------+------+-------------+----------------+--------+----------------+
| null| 1| []| [1, 1]| null|[1 -> 1, 2 -> 2]|
+---------+------+-------------+----------------+--------+----------------+
```
We can see that empty maps and primitives that are not set are null, however
lists show up as empty
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add Hive/Presto support in ProtoParquet
> ---------------------------------------
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
> Issue Type: Task
> Reporter: Constantin Muraru
> Priority: Major
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)