etseidl commented on code in PR #8376:
URL: https://github.com/apache/arrow-rs/pull/8376#discussion_r2365345007
##########
parquet/src/file/metadata/thrift_gen.rs:
##########
@@ -909,6 +819,341 @@ impl<'a, R: ThriftCompactInputProtocol<'a>>
ReadThrift<'a, R> for ParquetMetaDat
}
}
+thrift_struct!(
+ pub(crate) struct IndexPageHeader {}
+);
+
+thrift_struct!(
+pub(crate) struct DictionaryPageHeader {
+ /// Number of values in the dictionary
+ 1: required i32 num_values;
+
+ /// Encoding using this dictionary page
+ 2: required Encoding encoding
+
+ /// If true, the entries in the dictionary are sorted in ascending order
+ 3: optional bool is_sorted;
+}
+);
+
+// Statistics for the page header. This is separate because of the differing
lifetime requirements
+// for page handling vs column chunk. Once we start writing column chunks this
might need to be
Review Comment:
There is a thrift `Statistics` field on both the column metadata and the
page header. For the former I can use the `Statistics<'a>` struct which uses
slices for the min/max fields. The page header reader cannot use slices, so I
need the same struct but with vecs for the min/max. I can try to make this
explanation clearer.
Thankfully we can now skip reading this field altogether and not incur the
allocation cost.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]