vustef commented on code in PR #8715:
URL: https://github.com/apache/arrow-rs/pull/8715#discussion_r2524020441
##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -858,6 +867,55 @@ pub(crate) fn parquet_metadata_from_bytes(
Ok(ParquetMetaData::new(fmd, row_groups))
}
+/// Assign [`RowGroupMetaData::ordinal`] if it is missing.
+#[derive(Debug, Default)]
+pub(crate) struct OrdinalAssigner {
+ first_has_ordinal: bool,
+}
+
+impl OrdinalAssigner {
+ fn new() -> Self {
+ Default::default()
+ }
+
+ /// Sets [`RowGroupMetaData::ordinal`] if it is missing.
+ ///
+ /// # Arguments
+ /// - actual_ordinal: The ordinal (index) of the row group being processed
+ /// in the file metadata.
+ /// - rg: The [`RowGroupMetaData`] to potentially modify.
+ ///
+ /// Ensures:
+ /// 1. If the first row group has an ordinal, all subsequent row groups
must
+ /// also have ordinals.
+ /// 2. If the first row group does NOT have an ordinal, all subsequent row
+ /// groups must also not have ordinals.
+ fn ensure(
+ &mut self,
+ actual_ordinal: i16,
+ mut rg: RowGroupMetaData,
+ ) -> Result<RowGroupMetaData> {
+ let rg_has_ordinal = rg.ordinal.is_some();
+ if actual_ordinal == 0 {
+ self.first_has_ordinal = rg_has_ordinal;
+ }
Review Comment:
This code should only error out if either ordinal is present in metadata for
some row group but not for others, or vice-versa. That shouldn't happen with
row group skipping, i.e. even if we skip, all the remaining row groups that we
iterate through would fulfil this condition, right?
What I'd like though, is that the `build_row_number_reader` fails if
skipping occurs. Or that it ensures that there's no row skipping if it's
invoked. Not sure how to ensure that at this point though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]