vustef commented on code in PR #8715:
URL: https://github.com/apache/arrow-rs/pull/8715#discussion_r2524949788
##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -858,6 +867,55 @@ pub(crate) fn parquet_metadata_from_bytes(
Ok(ParquetMetaData::new(fmd, row_groups))
}
+/// Assign [`RowGroupMetaData::ordinal`] if it is missing.
+#[derive(Debug, Default)]
+pub(crate) struct OrdinalAssigner {
+ first_has_ordinal: bool,
+}
+
+impl OrdinalAssigner {
+ fn new() -> Self {
+ Default::default()
+ }
+
+ /// Sets [`RowGroupMetaData::ordinal`] if it is missing.
+ ///
+ /// # Arguments
+ /// - actual_ordinal: The ordinal (index) of the row group being processed
+ /// in the file metadata.
+ /// - rg: The [`RowGroupMetaData`] to potentially modify.
+ ///
+ /// Ensures:
+ /// 1. If the first row group has an ordinal, all subsequent row groups
must
+ /// also have ordinals.
+ /// 2. If the first row group does NOT have an ordinal, all subsequent row
+ /// groups must also not have ordinals.
+ fn ensure(
+ &mut self,
+ actual_ordinal: i16,
+ mut rg: RowGroupMetaData,
+ ) -> Result<RowGroupMetaData> {
+ let rg_has_ordinal = rg.ordinal.is_some();
+ if actual_ordinal == 0 {
+ self.first_has_ordinal = rg_has_ordinal;
+ }
Review Comment:
I see why we think differently. I was assuming then that the row group 2
would be the one to set `self.first_has_ordinal`. Because for example the loop
that invokes `ensure` would be going `for ordinal in 0..list_ident.size`, whe
size is going to be based only on the row groups after skipping.
Is that not going to happen?
Perhaps there might need some changes to be made once row group skipping is
implemented, not sure how best to guard against getting the intended behaviour
broken without catching it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]