[jira] [Commented] (PARQUET-951) Missing field id support in parquet metadata

ASF GitHub Bot (JIRA) Tue, 29 May 2018 10:04:15 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493865#comment-16493865
 ]


ASF GitHub Bot commented on PARQUET-951:
----------------------------------------

BenoitHanotte commented on issue #410: [PARQUET-951] Pull request for handling 
protobuf field id
URL: https://github.com/apache/parquet-mr/pull/410#issuecomment-392855680
 
 
   Hello @costimuraru @qinghui-xu @julienledem 
   As the protobuf descriptor is already serialized in the file metadata 
(https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java#L132)
 and contains all the information required to map the protobuf field id to its 
name, can't we leverage this instead of changing the way we set the field id in 
the parquet schema?
   Not only would this isolate the change to the protobuf part of the logic, it 
would also bring backward compatibility as files already contain the descriptor 
in its serialized form. In this case we would only need to set a flag at 
read-time, instead of also having to add a flag when writing.
   If we were setting the parquet field ids according to the protobuf ids, I 
don't think we would be able to support schema compatibility for files written 
with a previous version of parquet as the parquet schema of the file would be 
missing the required information.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Missing field id support in parquet metadata
> --------------------------------------------
>
>                 Key: PARQUET-951
>                 URL: https://issues.apache.org/jira/browse/PARQUET-951
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Qinghui Xu
>            Priority: Major
>
> Field id is essential for some serialization framework such as protobuf, and 
> they are used to keep schema forward/backward compatibility which could not 
> be achieved by using field names. Currently field id is not persisted as file 
> metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PARQUET-951) Missing field id support in parquet metadata

Reply via email to