alamb commented on code in PR #6135:
URL: https://github.com/apache/arrow-rs/pull/6135#discussion_r1693929413
##########
parquet/src/file/metadata/mod.rs:
##########
@@ -569,7 +569,7 @@ pub struct ColumnChunkMetaData {
/// For example, `vec[0]` is the number of rows with level 0, `vec[1]` is the
/// number of rows with level 1, and so on.
///
-#[derive(Debug, Clone, PartialEq)]
+#[derive(Debug, Clone, PartialEq, Eq, Hash, Default)]
Review Comment:
👍
##########
parquet/src/file/metadata/mod.rs:
##########
@@ -620,6 +620,12 @@ impl LevelHistogram {
}
}
+ /// Appends values from the other histogram to this histogram
Review Comment:
I found the "append" notion quite confusing which is why I didn't put it in
this struct at first
Basically without append we have the invariant that `histogram[i]`
represents the counts of level `i`
Using append breaks that invariant. If we are going to keep it, maybe we can
add some more comments like
```suggestion
/// Appends values from the other histogram to this histogram
///
/// For example, given `[1,2,3]` and `[4,5]` calling append will result
/// in `[1,2,3,4,5]` (not `[5,7,3]`)
```
##########
parquet/src/file/page_index/index.rs:
##########
@@ -40,13 +41,13 @@ pub struct PageIndex<T> {
///
/// `repetition_level_histogram[i]` is a count of how many values are at
repetition level `i`.
/// For example, `repetition_level_histogram[0]` indicates how many rows
the page contains.
- pub repetition_level_histogram: Option<Vec<i64>>,
+ pub repetition_level_histogram: Option<LevelHistogram>,
Review Comment:
👍 this is a good one -- I missed this
##########
parquet/src/file/metadata/mod.rs:
##########
@@ -1169,9 +1175,9 @@ pub struct ColumnIndexBuilder {
null_counts: Vec<i64>,
boundary_order: BoundaryOrder,
/// contains the concatenation of the histograms of all pages
- repetition_level_histograms: Option<Vec<i64>>,
+ repetition_level_histograms: Option<LevelHistogram>,
Review Comment:
because `ColumnIndexBuilder` is a builder for the `format::metadata` structs
(aka it is basically building up the serialized form of several
`LevelHistograms`, and needs append) I would find it clearer to use `Vec<i64>`
here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]