This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 9621f8c  GH-541: Document status of file_path (#542)
9621f8c is described below

commit 9621f8cd460d5a74a4afd20cd028ad5847b6f235
Author: emkornfield <[email protected]>
AuthorDate: Mon Feb 2 16:14:54 2026 -0800

    GH-541: Document status of file_path (#542)
---
 src/main/thrift/parquet.thrift | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 7ff9b9f..a9e62cc 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -963,6 +963,21 @@ union ColumnCryptoMetaData {
 struct ColumnChunk {
   /** File where column data is stored.  If not set, assumed to be same file as
     * metadata.  This path is relative to the current file.
+    *
+    * As of December 2025, the only known use-case for this field is writing 
summary 
+    * parquet files (i.e. "_metadata" files).  These files consolidate footers 
from 
+    * multiple parquet files to allow for efficient reading of footers to 
avoid file 
+    * listing costs and prune out files that do not need to be read based on 
statistics.
+    *
+    * These files do not appear to have ever been formally specified in the 
specification.
+    * and are potentially problematic from a correctness perspective [1].
+    * 
+    * [1] https://lists.apache.org/thread/ootf2kmyg3p01b1bvplpvp4ftd1bt72d
+    *
+    * There is no other known usage of this field. Specifically, there are no 
known 
+    * reference implementations that will read externally stored column data 
if this field is populated 
+    * within a standard parquet file. Making use of the field for this purpose 
is  
+    * not considered part of the Parquet specification.
     **/
   1: optional string file_path
 

Reply via email to