etseidl opened a new issue, #6115:
URL: https://github.com/apache/arrow-rs/issues/6115

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   The writing of the thrift `ColumnMetaData` outside of the Parquet file 
footer was recently deprecated 
(https://github.com/apache/parquet-format/pull/440), as was the setting of the 
`ColumnChunk::file_offset` field. Also, the `ColumnMetaData` currently written 
has incorrect values for `dictionary_page_offset` and `data_page_offset` (they 
are relative to the start of the chunk rather than being offset to their 
location in the file).
   
   **Describe the solution you'd like**
   The current Parquet 
[spec](https://github.com/apache/parquet-format/blob/5a5c8948e60770f8a8356a8f5e616d5ae1079d4b/src/main/thrift/parquet.thrift#L870-L878)
 indicates the `file_offset` field should be set to 0, and `ColumnMetaData` 
should no longer be written inline with the data.
   
   **Describe alternatives you've considered**
   If not removed, the offsets mentioned above should be set to correct values.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to