gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3401722093


##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -690,12 +715,45 @@ impl ListingTable {
     /// Get the list of files for a scan as well as the file level statistics.
     /// The list is grouped to let the execution plan know how the files should
     /// be distributed to different threads / executors.
+    ///
+    /// If [`ListingOptions::output_partitioning`] is set, the returned file
+    /// groups preserve that declared partition count, including empty trailing
+    /// groups when needed, rather than using
+    /// [`ListingOptions::target_partitions`].

Review Comment:
   Addressed with tighter docs on ListingOptions::target_partitions / 
output_partitioning. The current behavior is that output_partitioning is 
authoritative when set, and target_partitions is only used when no output 
partitioning is declared.



##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -505,12 +506,31 @@ impl TableProvider for ListingTable {
         // at the same time. This is because the limit should be applied after 
the filters are applied.
         let statistic_file_limit = if filters.is_empty() { limit } else { None 
};
 
+        let declared_output_partitioning = if partition_filters.is_empty() {
+            self.options.output_partitioning.clone()
+        } else {
+            // Partition pruning can remove files before grouping. Without a
+            // stable file-to-declared-partition mapping, regrouping the
+            // remaining files could shift them into the wrong partition index.
+            None

Review Comment:
   Addressed. For declared output partitioning, listing now keeps the full file 
set through file-group assignment and applies partition filters within each 
declared group afterward. That preserves the declared partition indexes while 
still pruning files.



##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -505,12 +506,31 @@ impl TableProvider for ListingTable {
         // at the same time. This is because the limit should be applied after 
the filters are applied.
         let statistic_file_limit = if filters.is_empty() { limit } else { None 
};
 
+        let declared_output_partitioning = if partition_filters.is_empty() {
+            self.options.output_partitioning.clone()
+        } else {
+            // Partition pruning can remove files before grouping. Without a
+            // stable file-to-declared-partition mapping, regrouping the
+            // remaining files could shift them into the wrong partition index.
+            None
+        };
+        let target_partitions = declared_output_partitioning
+            .as_ref()
+            .map(Partitioning::partition_count)
+            .unwrap_or(self.options.target_partitions);
+

Review Comment:
   Addressed. with_output_partitioning aligns target_partitions from the 
declared partition count, and scan planning validates the final file group 
count against output_partitioning.partition_count() before building the scan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to