gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3401723149
##########
datafusion/datasource/src/file_scan_config/mod.rs:
##########
@@ -206,6 +206,13 @@ pub struct FileScanConfig {
/// If the number of file partitions > target_partitions, the file
partitions will be grouped
/// in a round-robin fashion such that number of file partitions =
target_partitions.
pub partitioned_by_file_group: bool,
+ /// Optional declared output partitioning of this file scan.
+ ///
+ /// Expressions are in terms of the full table schema, before scan
+ /// projection or filtering. If the partition count does not match the
+ /// number of file groups, [`DataSource::output_partitioning`] falls back
to
+ /// [`Partitioning::UnknownPartitioning`].
+ pub output_partitioning: Option<Partitioning>,
Review Comment:
Leaving this as follow-up. I agree output_partitioning gives us a possible
path to remove partitioned_by_file_group eventually, but doing it here would
change the existing Hive partition grouping path in addition to adding declared
output partitioning. This PR keeps that behavior intact.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]