[
https://issues.apache.org/jira/browse/PARQUET-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493865#comment-16493865
]
ASF GitHub Bot commented on PARQUET-951:
----------------------------------------
BenoitHanotte commented on issue #410: [PARQUET-951] Pull request for handling
protobuf field id
URL: https://github.com/apache/parquet-mr/pull/410#issuecomment-392855680
Hello @costimuraru @qinghui-xu @julienledem
As the protobuf descriptor is already serialized in the file metadata
(https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java#L132)
and contains all the information required to map the protobuf field id to its
name, can't we leverage this instead of changing the way we set the field id in
the parquet schema?
Not only would this isolate the change to the protobuf part of the logic, it
would also bring backward compatibility as files already contain the descriptor
in its serialized form. In this case we would only need to set a flag at
read-time, instead of also having to add a flag when writing.
If we were setting the parquet field ids according to the protobuf ids, I
don't think we would be able to support schema compatibility for files written
with a previous version of parquet as the parquet schema of the file would be
missing the required information.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Missing field id support in parquet metadata
> --------------------------------------------
>
> Key: PARQUET-951
> URL: https://issues.apache.org/jira/browse/PARQUET-951
> Project: Parquet
> Issue Type: Bug
> Reporter: Qinghui Xu
> Priority: Major
>
> Field id is essential for some serialization framework such as protobuf, and
> they are used to keep schema forward/backward compatibility which could not
> be achieved by using field names. Currently field id is not persisted as file
> metadata.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)