[ 
https://issues.apache.org/jira/browse/PARQUET-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501327#comment-16501327
 ] 

ASF GitHub Bot commented on PARQUET-951:
----------------------------------------

lukasnalezenec commented on issue #410: [PARQUET-951] Pull request for handling 
protobuf field id
URL: https://github.com/apache/parquet-mr/pull/410#issuecomment-394592509
 
 
   Hi,
   We already write field ids to schema.
   
https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java#L80
   
   2018-05-29 19:03 GMT+02:00 Benoît Hanotte <[email protected]>:
   
   > Hello @costimuraru <https://github.com/costimuraru> @qinghui-xu
   > <https://github.com/qinghui-xu> @julienledem
   > <https://github.com/julienledem>
   > As the protobuf descriptor is already serialized in the file metadata (
   > https://github.com/apache/parquet-mr/blob/master/
   > parquet-protobuf/src/main/java/org/apache/parquet/proto/
   > ProtoWriteSupport.java#L132) and contains all the information required to
   > map the protobuf field id to its name, can't we leverage this instead of
   > changing the way we set the field id in the parquet schema?
   > Not only would this isolate the change to the protobuf part of the logic,
   > it would also bring backward compatibility as files already contain the
   > descriptor in its serialized form. In this case we would only need to set a
   > flag at read-time, instead of also having to add a flag when writing.
   > If we were setting the parquet field ids according to the protobuf ids, I
   > don't think we would be able to support schema compatibility for files
   > written with a previous version of parquet as the parquet schema of the
   > file would be missing the required information.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/parquet-mr/pull/410#issuecomment-392855680>,
   > or mute the thread
   > 
<https://github.com/notifications/unsubscribe-auth/AEJuUHhtTNYvu9_46NR2goixJ-6POb-uks5t3X9agaJpZM4NImRv>
   > .
   >
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Missing field id support in parquet metadata
> --------------------------------------------
>
>                 Key: PARQUET-951
>                 URL: https://issues.apache.org/jira/browse/PARQUET-951
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Qinghui Xu
>            Priority: Major
>
> Field id is essential for some serialization framework such as protobuf, and 
> they are used to keep schema forward/backward compatibility which could not 
> be achieved by using field names. Currently field id is not persisted as file 
> metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to