alamb commented on code in PR #6045:
URL: https://github.com/apache/arrow-rs/pull/6045#discussion_r1676801505
##########
parquet/src/format.rs:
##########
@@ -3417,10 +3565,15 @@ pub struct ColumnMetaData {
/// Writers should write this field so readers can read the bloom filter
/// in a single I/O.
pub bloom_filter_length: Option<i32>,
+ /// Optional statistics to help estimate total memory when converted to
in-memory
Review Comment:
🤔 https://docs.rs/parquet/latest/parquet/format/struct.SortingColumn.html
appears to be publically exposed (and thus this is technically a breaking API
change).
Thus I think that means we couldn't merge this until August when master
opens for breaking changes
https://github.com/apache/arrow-rs/blob/master/CONTRIBUTING.md#breaking-changes
However, managing stacked PRs is going to get painful if we have to keep
them open for an extended period of time.
THus, despite my comments on
https://github.com/apache/arrow-rs/pull/5486#pullrequestreview-2171737191 to
the contrary I think it would be ok and to have the whole feature in a single
PR.
~So perhaps we can close this PR.~
Update: see below for an alternate suggestion
Sorry about that @etseidl . I am still getting used to the relatively new
"breaking API" workflow in this repo
##########
parquet/src/format.rs:
##########
@@ -3417,10 +3565,15 @@ pub struct ColumnMetaData {
/// Writers should write this field so readers can read the bloom filter
/// in a single I/O.
pub bloom_filter_length: Option<i32>,
+ /// Optional statistics to help estimate total memory when converted to
in-memory
Review Comment:
An alternative would be to make a feature branch that we could target /
review PRs to, and then I could merge the feature branch to main when ready.
@XiangpengHao, @Weijun-H, and I took this approach for StringView in DataFusion
and it worked
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]