alamb commented on code in PR #10946:
URL: https://github.com/apache/datafusion/pull/10946#discussion_r1642816182


##########
datafusion/core/src/datasource/physical_plan/parquet/statistics.rs:
##########
@@ -1041,21 +1043,20 @@ impl<'a> StatisticsConverter<'a> {
     pub fn data_page_row_counts<I>(
         &self,
         column_offset_index: &ParquetOffsetIndex,
-        row_group_metadatas: &[RowGroupMetaData],
+        row_group_metadatas: &'a [RowGroupMetaData],
         row_group_indices: I,
-    ) -> Result<ArrayRef>
+    ) -> Result<UInt64Array>
     where
         I: IntoIterator<Item = &'a usize>,
     {
-        let data_type = self.arrow_field.data_type();
-
         let Some(parquet_index) = self.parquet_index else {
-            return Ok(self.make_null_array(data_type, row_group_indices));
+            // no matching column found in parquet_index;
+            // thus we cannot extract page_locations in order to determine
+            // the row count on a per DataPage basis.
+            // We use `row_group_row_counts` instead.
+            return Self::row_group_row_counts(row_group_metadatas);

Review Comment:
   > So for now, we only change data_page_row_counts and merge this? I'd be 
fine with that.
   
   I think that is a good idea and I will file a ticket to fix 
`row_group_row_counts` in a follow on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to