etseidl commented on code in PR #8715:
URL: https://github.com/apache/arrow-rs/pull/8715#discussion_r2524998016


##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -858,6 +867,55 @@ pub(crate) fn parquet_metadata_from_bytes(
     Ok(ParquetMetaData::new(fmd, row_groups))
 }
 
+/// Assign [`RowGroupMetaData::ordinal`]  if it is missing.
+#[derive(Debug, Default)]
+pub(crate) struct OrdinalAssigner {
+    first_has_ordinal: bool,
+}
+
+impl OrdinalAssigner {
+    fn new() -> Self {
+        Default::default()
+    }
+
+    /// Sets [`RowGroupMetaData::ordinal`] if it is missing.
+    ///
+    /// # Arguments
+    /// - actual_ordinal: The ordinal (index) of the row group being processed
+    ///   in the file metadata.
+    /// - rg: The [`RowGroupMetaData`] to potentially modify.
+    ///
+    /// Ensures:
+    /// 1. If the first row group has an ordinal, all subsequent row groups 
must
+    ///    also have ordinals.
+    /// 2. If the first row group does NOT have an ordinal, all subsequent row
+    ///    groups must also not have ordinals.
+    fn ensure(
+        &mut self,
+        actual_ordinal: i16,
+        mut rg: RowGroupMetaData,
+    ) -> Result<RowGroupMetaData> {
+        let rg_has_ordinal = rg.ordinal.is_some();
+        if actual_ordinal == 0 {
+            self.first_has_ordinal = rg_has_ordinal;
+        }

Review Comment:
   > Because for example the loop that invokes ensure would be going for 
ordinal in 0..list_ident.size, whe size is going to be based only on the row 
groups after skipping.
   
   Ah, but here we're in a function whose input is the entire encoded footer. 
`list_ident.size` will always be the unfiltered number of row groups. So I 
think we'll either:
    - loop over `ordinal in 0..list_ident.size`, but skip decoding if `ordinal` 
not in a list
    - assuming an index, loop over a list of ordinals, decode the row group 
using a range from the index, and pass `ordinal` as `actual_ordinal`
   
   In either event, we'd pass `2` as `actual_ordinal` in my scenario above. But 
yeah, that's pretty far down the road. Guess we can solve it then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to