etseidl commented on code in PR #8715:
URL: https://github.com/apache/arrow-rs/pull/8715#discussion_r2519929870
##########
parquet/src/file/metadata/thrift/encryption.rs:
##########
@@ -294,10 +295,19 @@ pub(crate) fn parquet_metadata_with_encryption(
file_decryptor = Some(file_decryptor_value);
}
- // decrypt column chunk info
+ // decrypt column chunk info and handle ordinal assignment
Review Comment:
I guess this doesn't hurt, but it shouldn't be necessary. The row groups
already passed through the `OrdinalAssigner` in `parquet_metadata_from_bytes`,
and `row_group_from_encrypted_thrift` re-uses the `ordinal` from the
`RowGroupMetaData` passed in.
##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -858,6 +867,55 @@ pub(crate) fn parquet_metadata_from_bytes(
Ok(ParquetMetaData::new(fmd, row_groups))
}
+/// Assign [`RowGroupMetaData::ordinal`] if it is missing.
+#[derive(Debug, Default)]
+pub(crate) struct OrdinalAssigner {
+ first_has_ordinal: bool,
+}
+
+impl OrdinalAssigner {
+ fn new() -> Self {
+ Default::default()
+ }
+
+ /// Sets [`RowGroupMetaData::ordinal`] if it is missing.
+ ///
+ /// # Arguments
+ /// - actual_ordinal: The ordinal (index) of the row group being processed
+ /// in the file metadata.
+ /// - rg: The [`RowGroupMetaData`] to potentially modify.
+ ///
+ /// Ensures:
+ /// 1. If the first row group has an ordinal, all subsequent row groups
must
+ /// also have ordinals.
+ /// 2. If the first row group does NOT have an ordinal, all subsequent row
+ /// groups must also not have ordinals.
+ fn ensure(
+ &mut self,
+ actual_ordinal: i16,
+ mut rg: RowGroupMetaData,
+ ) -> Result<RowGroupMetaData> {
+ let rg_has_ordinal = rg.ordinal.is_some();
+ if actual_ordinal == 0 {
+ self.first_has_ordinal = rg_has_ordinal;
+ }
Review Comment:
I wonder if `first_has_ordinal` should be an option, and then set it if it's
`None`. If/when we implement row group selection the first ordinal seen may not
be 0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]