Re: [PR] [thrift-remodel] Use new Thrift encoder/decoder for Parquet page headers [arrow-rs]

via GitHub Fri, 19 Sep 2025 23:12:58 -0700


etseidl commented on code in PR #8376:
URL: https://github.com/apache/arrow-rs/pull/8376#discussion_r2365345007



##########
parquet/src/file/metadata/thrift_gen.rs:
##########
@@ -909,6 +819,341 @@ impl<'a, R: ThriftCompactInputProtocol<'a>> 
ReadThrift<'a, R> for ParquetMetaDat
     }
 }
 
+thrift_struct!(
+    pub(crate) struct IndexPageHeader {}
+);
+
+thrift_struct!(
+pub(crate) struct DictionaryPageHeader {
+  /// Number of values in the dictionary
+  1: required i32 num_values;
+
+  /// Encoding using this dictionary page
+  2: required Encoding encoding
+
+  /// If true, the entries in the dictionary are sorted in ascending order
+  3: optional bool is_sorted;
+}
+);
+
+// Statistics for the page header. This is separate because of the differing 
lifetime requirements
+// for page handling vs column chunk. Once we start writing column chunks this 
might need to be

Review Comment:
   There is a thrift `Statistics` field on both the column metadata and the 
page header. For the former I can use the `Statistics<'a>` struct which uses 
slices for the min/max fields. The page header reader cannot use slices, so I 
need the same struct but with vecs for the min/max. I can try to make this 
explanation clearer.
   
   Thankfully we can now skip reading this field altogether and not incur the 
allocation cost.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [thrift-remodel] Use new Thrift encoder/decoder for Parquet page headers [arrow-rs]

Reply via email to