Re: [PR] Add declared file scan output partitioning [datafusion]

via GitHub Fri, 12 Jun 2026 01:23:19 -0700


gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3401723149



##########
datafusion/datasource/src/file_scan_config/mod.rs:
##########
@@ -206,6 +206,13 @@ pub struct FileScanConfig {
     /// If the number of file partitions > target_partitions, the file 
partitions will be grouped
     /// in a round-robin fashion such that number of file partitions = 
target_partitions.
     pub partitioned_by_file_group: bool,
+    /// Optional declared output partitioning of this file scan.
+    ///
+    /// Expressions are in terms of the full table schema, before scan
+    /// projection or filtering. If the partition count does not match the
+    /// number of file groups, [`DataSource::output_partitioning`] falls back 
to
+    /// [`Partitioning::UnknownPartitioning`].
+    pub output_partitioning: Option<Partitioning>,

Review Comment:
   Leaving this as follow-up. I agree output_partitioning gives us a possible 
path to remove partitioned_by_file_group eventually, but doing it here would 
change the existing Hive partition grouping path in addition to adding declared 
output partitioning. This PR keeps that behavior intact.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add declared file scan output partitioning [datafusion]

Reply via email to