gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3401723019
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -1286,6 +1288,80 @@ mod tests {
Ok(())
}
+ #[tokio::test]
+ async fn test_list_files_uses_declared_output_partitioning_count() ->
Result<()> {
+ let files = ["bucket/key-prefix/file0", "bucket/key-prefix/file1"];
+
+ let ctx = SessionContext::new();
+ register_test_store(&ctx, &files.iter().map(|f| (*f,
10)).collect::<Vec<_>>());
+
+ let opt = ListingOptions::new(Arc::new(JsonFormat::default()))
+ .with_file_extension_opt(Some(""))
+ .with_target_partitions(1)
+ .with_output_partitioning(Some(Partitioning::RoundRobinBatch(4)));
+
Review Comment:
Partially addressed. output_partitioning is now authoritative when set and
the docs call out that target_partitions is ignored for declared output
partitioning. I left with_target_partitions itself unchanged to preserve the
existing API; deprecating or reinterpreting it as RoundRobinBatch seems better
as a separate follow-up.
##########
datafusion/catalog-listing/src/options.rs:
##########
@@ -42,8 +43,10 @@ pub struct ListingOptions {
/// This can add a lot of overhead as it will usually require files
/// to be opened and at least partially parsed.
pub collect_stat: bool,
- /// Group files to avoid that the number of partitions exceeds
- /// this limit
+ /// Group files to avoid that the number of partitions exceeds this limit.
+ ///
+ /// If [`Self::output_partitioning`] is set, its partition count is used
+ /// instead, even when it exceeds this value.
pub target_partitions: usize,
Review Comment:
Addressed in the latest push. Declared output partitioning is now
authoritative: with_output_partitioning aligns target_partitions to the
declared partition count, and scan planning validates the resulting file group
count before advertising the partitioning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]