vustef commented on code in PR #8715:
URL: https://github.com/apache/arrow-rs/pull/8715#discussion_r2522799979
##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -858,6 +867,55 @@ pub(crate) fn parquet_metadata_from_bytes(
Ok(ParquetMetaData::new(fmd, row_groups))
}
+/// Assign [`RowGroupMetaData::ordinal`] if it is missing.
+#[derive(Debug, Default)]
+pub(crate) struct OrdinalAssigner {
+ first_has_ordinal: bool,
+}
+
+impl OrdinalAssigner {
+ fn new() -> Self {
+ Default::default()
+ }
+
+ /// Sets [`RowGroupMetaData::ordinal`] if it is missing.
+ ///
+ /// # Arguments
+ /// - actual_ordinal: The ordinal (index) of the row group being processed
+ /// in the file metadata.
+ /// - rg: The [`RowGroupMetaData`] to potentially modify.
+ ///
+ /// Ensures:
+ /// 1. If the first row group has an ordinal, all subsequent row groups
must
+ /// also have ordinals.
+ /// 2. If the first row group does NOT have an ordinal, all subsequent row
+ /// groups must also not have ordinals.
+ fn ensure(
+ &mut self,
+ actual_ordinal: i16,
+ mut rg: RowGroupMetaData,
+ ) -> Result<RowGroupMetaData> {
+ let rg_has_ordinal = rg.ordinal.is_some();
+ if actual_ordinal == 0 {
+ self.first_has_ordinal = rg_has_ordinal;
+ }
Review Comment:
But then enumeration won't work at all, and we'd have to rely on ordinal in
the metadata? Actually, row numbers feature won't work at all, since it relies
on having information (sizes in rows) about all row groups to figure out first
row index of the row group it reads. Unless some trick that I'm not aware of is
used.
Note that row numbers have to be stable across queries (i.e. independent of
whether there was filtering), otherwise we would've implemented them on the
client side by just enumerating rows.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]