Re: [PR] Add declared file scan output partitioning [datafusion]

via GitHub Fri, 12 Jun 2026 01:17:15 -0700


gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3401723149



##########
datafusion/datasource/src/file_scan_config/mod.rs:
##########
@@ -206,6 +206,13 @@ pub struct FileScanConfig {
     /// If the number of file partitions > target_partitions, the file 
partitions will be grouped
     /// in a round-robin fashion such that number of file partitions = 
target_partitions.
     pub partitioned_by_file_group: bool,
+    /// Optional declared output partitioning of this file scan.
+    ///
+    /// Expressions are in terms of the full table schema, before scan
+    /// projection or filtering. If the partition count does not match the
+    /// number of file groups, [`DataSource::output_partitioning`] falls back 
to
+    /// [`Partitioning::UnknownPartitioning`].
+    pub output_partitioning: Option<Partitioning>,

Review Comment:
   Leaving this as follow-up. I think that output_partitioning gives opens up 
to remove partitioned_by_file_group, but doing it here is changing the Hive 
partition grouping path and don't know if we should do that in same PR as this 
guy is quite large



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add declared file scan output partitioning [datafusion]

Reply via email to