danielcweeks commented on code in PR #542:
URL: https://github.com/apache/parquet-format/pull/542#discussion_r2636111324


##########
src/main/thrift/parquet.thrift:
##########
@@ -958,6 +958,24 @@ union ColumnCryptoMetaData {
 struct ColumnChunk {
   /** File where column data is stored.  If not set, assumed to be same file as
     * metadata.  This path is relative to the current file.
+    *
+    * As of December 2025, the only known use-case for this field is writing 
summary 
+    * parquet files (i.e. "_metadata" files).  These files consolidate footers 
from 
+    * multiple parquet files to allow for efficient reading of footers to 
avoid file 
+    * listing costs and prune out files that do not need to be read based on 
statistics. 
+    * This is legacy feature as modern table formats (e.g. Iceberg, Hudi and 
Delta Lake)
+    * are more scalable and serve effectively the same purpose.
+    *
+    * There is no other known use-case for this field. Specifically, there are 
no known 
+    * readers that will read externally stored column data if this field is 
populated 
+    * within a standard parquet file.
+    *
+    * Writers should not populate this field except for in parquet summary 
files. Readers
+    * should ensure this field is empty.

Review Comment:
   These statements are effectively removing it from the spec, which I feel is 
too strong of a position for this clarification.  I think it's fair to say that 
"readers should validate this field is empty if they do not support reading 
external column data", but prohibiting it is not a clarification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to