gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3401722408
##########
datafusion/catalog-listing/src/options.rs:
##########
@@ -61,6 +64,17 @@ pub struct ListingOptions {
/// multiple equivalent orderings, the outer `Vec` will have a
/// single element.
pub file_sort_order: Vec<Vec<SortExpr>>,
+ /// Optional declared output partitioning for this table.
+ ///
+ /// Expressions are specified against the full table schema. When set,
+ /// [`ListingTable`](crate::ListingTable) creates one scan file group per
+ /// declared output partition instead of using [`Self::target_partitions`].
+ /// Empty file groups are added when needed to preserve that count.
+ ///
+ /// Files are sorted by path before grouping. DataFusion does not validate
+ /// that rows match the declaration, so callers must ensure file group `i`
+ /// contains only rows for declared output partition `i`.
+ pub output_partitioning: Option<Partitioning>,
Review Comment:
Addressed. ListingOptions now stores datafusion_expr::Partitioning, and
ListingTable converts the logical declaration to physical output partitioning
during scan planning.
##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -690,12 +715,45 @@ impl ListingTable {
/// Get the list of files for a scan as well as the file level statistics.
/// The list is grouped to let the execution plan know how the files should
/// be distributed to different threads / executors.
+ ///
+ /// If [`ListingOptions::output_partitioning`] is set, the returned file
+ /// groups preserve that declared partition count, including empty trailing
+ /// groups when needed, rather than using
+ /// [`ListingOptions::target_partitions`].
pub async fn list_files_for_scan<'a>(
&'a self,
ctx: &'a dyn Session,
filters: &'a [Expr],
limit: Option<usize>,
) -> datafusion_common::Result<ListFilesResult> {
+ let declared_output_partitioning =
self.options.output_partitioning.as_ref();
+ let target_partitions = declared_output_partitioning
+ .map(Partitioning::partition_count)
+ .unwrap_or(self.options.target_partitions);
+ self.list_files_for_scan_with_target(
+ ctx,
+ filters,
+ limit,
+ target_partitions,
+ declared_output_partitioning.is_some(),
+ )
+ .await
+ }
+
+ async fn list_files_for_scan_with_target<'a>(
+ &'a self,
+ ctx: &'a dyn Session,
+ filters: &'a [Expr],
+ limit: Option<usize>,
+ target_partitions: usize,
Review Comment:
Addressed by removing list_files_for_scan_with_target. list_files_for_scan
now derives the target count and declared-partition behavior from self.options.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]