alamb commented on code in PR #9678:
URL: https://github.com/apache/arrow-rs/pull/9678#discussion_r3065940477


##########
parquet/src/file/properties.rs:
##########
@@ -837,6 +851,22 @@ impl WriterPropertiesBuilder {
         self
     }
 
+    /// Should the writer should emit the `path_in_schema` element of the
+    /// `ColumnMetaData` Thrift struct.
+    ///
+    /// The `path_in_schema` field in the Thrift metadata is redundant and 
wastes a great
+    /// deal of space. Parquet file footers can be made much smaller by 
omitting this field.
+    /// Because the field was originally a mandatory field, this property 
defaults to `true`

Review Comment:
   If we choose to go this way I think it would help to give some more context 
here on what types of readers would be affected (basically all the parquet-java 
based readers prior to late 2026)
   
   We could also perhaps provide a link to the discussion: 
https://lists.apache.org/thread/czm2bk45wwtkhhpqxqvmx9dk5wkwk1kt



##########
parquet/src/file/properties.rs:
##########
@@ -837,6 +851,22 @@ impl WriterPropertiesBuilder {
         self
     }
 
+    /// Should the writer should emit the `path_in_schema` element of the
+    /// `ColumnMetaData` Thrift struct.

Review Comment:
   I think it is worth making it more apparently that this will cause 
incompatibilities with some older readers:
   
   ```suggestion
       /// Should the writer should emit the `path_in_schema` element of the
       /// `ColumnMetaData` Thrift struct. WARNING: this will make the parquet 
       /// file unreadable by some older parquet readers. See LINK HERE for 
details
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to