Copilot commented on code in PR #19073:
URL: https://github.com/apache/datafusion/pull/19073#discussion_r2620369944
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -1599,6 +1603,25 @@ impl DefaultPhysicalPlanner {
}
}
+fn has_sufficient_rows_for_repartition(
+ input: &Arc<dyn ExecutionPlan>,
+ session_state: &SessionState,
+) -> Result<bool> {
+ // Get partition statistics, default to repartitioning if unavailable
+ let stats = match input.partition_statistics(None) {
+ Ok(s) => s,
+ Err(_) => return Ok(true),
+ };
Review Comment:
The error from `partition_statistics()` is silently discarded. Consider
logging the error at debug level before defaulting to repartitioning, as this
would help diagnose cases where statistics are unexpectedly unavailable.
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -3215,9 +3238,18 @@ mod tests {
#[tokio::test]
async fn hash_agg_group_by_partitioned_on_dicts() -> Result<()> {
- let dict_array: DictionaryArray<Int32Type> =
- vec!["A", "B", "A", "A", "C", "A"].into_iter().collect();
- let val_array: Int32Array = vec![1, 2, 2, 4, 1, 1].into();
+ // Use a larger dataset to ensure repartitioning still happens even
after
+ // enabling the small dataset optimization.
+ let dict_values: Vec<&str> = (0..10_000)
+ .map(|i| match i % 4 {
+ 0 => "A",
+ 1 => "B",
+ 2 => "C",
+ _ => "D",
+ })
+ .collect();
Review Comment:
The magic number 10_000 is used to create a dataset large enough to trigger
repartitioning. Consider extracting this as a constant (e.g.,
`LARGE_DATASET_SIZE`) to clarify its purpose and make it easier to adjust if
the batch size threshold changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]