JFinis commented on PR #242:
URL: https://github.com/apache/parquet-format/pull/242#issuecomment-2152792995

   Interesting, so this feature is basically used to create a file comparable
   to an Iceberg manifest. I see that it can be used for that.
   
   Design-wise, I'm not the biggest fan of this special casing this through an
   extra field instead of just storing a Parquet file that has all information
   in normal Parquet columns (like a DeltaLake checkpoint Parquet file), but
   the design is the way it is. Therefore, I do see that this field can be
   used this way and I guess therefore there is a valid use case for this, so
   it probably needs to be maintained for backward compatibility.
   
   Cheers,
   Jan
   
   Am Do., 6. Juni 2024 um 16:08 Uhr schrieb Rok Mihevc <
   ***@***.***>:
   
   > ***@***.**** commented on this pull request.
   > ------------------------------
   >
   > In src/main/thrift/parquet.thrift
   > <https://github.com/apache/parquet-format/pull/242#discussion_r1629624783>
   > :
   >
   > > @@ -885,6 +971,44 @@ struct ColumnChunk {
   >    9: optional binary encrypted_column_metadata
   >  }
   >
   > +struct ColumnChunkV3 {
   > +  /** File where column data is stored. **/
   > +  1: optional string file_path
   >
   > PyArrow provides write_metadata
   > 
<https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_metadata.html>
   > and parquet_dataset
   > 
<https://arrow.apache.org/docs/python/generated/pyarrow.dataset.parquet_dataset.html#pyarrow.dataset.parquet_dataset>
   > for such use case.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/parquet-format/pull/242#discussion_r1629624783>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AALLIYWBZ23WE4BG6MVBI7TZGBUNFAVCNFSM6AAAAABHZ7LAPSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMBSGA4TKMRQGM>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to