alamb commented on code in PR #9678:
URL: https://github.com/apache/arrow-rs/pull/9678#discussion_r3065940477
##########
parquet/src/file/properties.rs:
##########
@@ -837,6 +851,22 @@ impl WriterPropertiesBuilder {
self
}
+ /// Should the writer should emit the `path_in_schema` element of the
+ /// `ColumnMetaData` Thrift struct.
+ ///
+ /// The `path_in_schema` field in the Thrift metadata is redundant and
wastes a great
+ /// deal of space. Parquet file footers can be made much smaller by
omitting this field.
+ /// Because the field was originally a mandatory field, this property
defaults to `true`
Review Comment:
If we choose to go this way I think it would help to give some more context
here on what types of readers would be affected (basically all the parquet-java
based readers prior to late 2026)
We could also perhaps provide a link to the discussion:
https://lists.apache.org/thread/czm2bk45wwtkhhpqxqvmx9dk5wkwk1kt
##########
parquet/src/file/properties.rs:
##########
@@ -837,6 +851,22 @@ impl WriterPropertiesBuilder {
self
}
+ /// Should the writer should emit the `path_in_schema` element of the
+ /// `ColumnMetaData` Thrift struct.
Review Comment:
I think it is worth making it more apparently that this will cause
incompatibilities with some older readers:
```suggestion
/// Should the writer should emit the `path_in_schema` element of the
/// `ColumnMetaData` Thrift struct. WARNING: this will make the parquet
/// file unreadable by some older parquet readers. See LINK HERE for
details
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]