danielcweeks commented on code in PR #542:
URL: https://github.com/apache/parquet-format/pull/542#discussion_r2673774670


##########
src/main/thrift/parquet.thrift:
##########
@@ -958,6 +958,21 @@ union ColumnCryptoMetaData {
 struct ColumnChunk {
   /** File where column data is stored.  If not set, assumed to be same file as
     * metadata.  This path is relative to the current file.
+    *
+    * As of December 2025, the only known use-case for this field is writing 
summary 
+    * parquet files (i.e. "_metadata" files).  These files consolidate footers 
from 
+    * multiple parquet files to allow for efficient reading of footers to 
avoid file 
+    * listing costs and prune out files that do not need to be read based on 
statistics.
+    *
+    * These files do not appear to have ever been formally specified in the 
specification.
+    * and are potentially problematic from a correctness perspective [1].
+    * 
+    * [1] https://lists.apache.org/thread/ootf2kmyg3p01b1bvplpvp4ftd1bt72d
+    *
+    * There is no other known usage of this field. Specifically, there are no 
known 
+    * reference implementations that will read externally stored column data 
if this field is populated 
+    * within a standard parquet file. Making use of the field for this purpose 
is  
+    * not considered part of the Parquet specification.

Review Comment:
   ```suggestion
       * within a standard parquet file. Making use of the field for this 
purpose is  
       * discouraged and may be removed in a later revision of the 
specification.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to