(datafusion) 01/03: chore: default=true for skip_physical_aggregate_schema_check, and add warn logging

mneumann Fri, 20 Jun 2025 04:32:51 -0700

This is an automated email from the ASF dual-hosted git repository.

mneumann pushed a commit to branch upgrade-df-ver4601-a
in repository https://gitbox.apache.org/repos/asf/datafusion.git


commit a5b5560479f0500c24eb1198d8dbc7ecfafe94a5
Author: wiedld <dlw...@gmail.com>
AuthorDate: Tue Oct 15 12:51:48 2024 -0700

    chore: default=true for skip_physical_aggregate_schema_check, and add warn 
logging
---
 datafusion/common/src/config.rs                           | 2 +-
 datafusion/core/src/physical_planner.rs                   | 3 +++
 datafusion/sqllogictest/test_files/information_schema.slt | 4 ++--
 docs/source/user-guide/configs.md                         | 2 +-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/datafusion/common/src/config.rs b/datafusion/common/src/config.rs
index 65ecd40327..bddf8dc5d4 100644
--- a/datafusion/common/src/config.rs
+++ b/datafusion/common/src/config.rs
@@ -314,7 +314,7 @@ config_namespace! {
         ///
         /// This is used to workaround bugs in the planner that are now caught 
by
         /// the new schema verification step.
-        pub skip_physical_aggregate_schema_check: bool, default = false
+        pub skip_physical_aggregate_schema_check: bool, default = true
 
         /// Specifies the reserved memory for each spillable sort operation to
         /// facilitate an in-memory merge.
diff --git a/datafusion/core/src/physical_planner.rs 
b/datafusion/core/src/physical_planner.rs
index a74cdcc592..31daff7e5d 100644
--- a/datafusion/core/src/physical_planner.rs
+++ b/datafusion/core/src/physical_planner.rs
@@ -688,6 +688,9 @@ impl DefaultPhysicalPlanner {
                             differences.push(format!("field nullability at 
index {} [{}]: (physical) {} vs (logical) {}", i, physical_field.name(), 
physical_field.is_nullable(), logical_field.is_nullable()));
                         }
                     }
+
+                    log::warn!("Physical input schema should be the same as 
the one converted from logical input schema, but did not match for logical 
plan:\n{}", input.display_indent());
+
                     return internal_err!("Physical input schema should be the 
same as the one converted from logical input schema. Differences: {}", 
differences
                         .iter()
                         .map(|s| format!("\n\t- {}", s))
diff --git a/datafusion/sqllogictest/test_files/information_schema.slt 
b/datafusion/sqllogictest/test_files/information_schema.slt
index b0538b5e65..37c00bf1b4 100644
--- a/datafusion/sqllogictest/test_files/information_schema.slt
+++ b/datafusion/sqllogictest/test_files/information_schema.slt
@@ -224,7 +224,7 @@ datafusion.execution.parquet.writer_version 1.0
 datafusion.execution.planning_concurrency 13
 datafusion.execution.skip_partial_aggregation_probe_ratio_threshold 0.8
 datafusion.execution.skip_partial_aggregation_probe_rows_threshold 100000
-datafusion.execution.skip_physical_aggregate_schema_check false
+datafusion.execution.skip_physical_aggregate_schema_check true
 datafusion.execution.soft_max_rows_per_output_file 50000000
 datafusion.execution.sort_in_place_threshold_bytes 1048576
 datafusion.execution.sort_spill_reservation_bytes 10485760
@@ -321,7 +321,7 @@ datafusion.execution.parquet.writer_version 1.0 (writing) 
Sets parquet writer ve
 datafusion.execution.planning_concurrency 13 Fan-out during initial physical 
planning. This is mostly use to plan `UNION` children in parallel. Defaults to 
the number of CPU cores on the system
 datafusion.execution.skip_partial_aggregation_probe_ratio_threshold 0.8 
Aggregation ratio (number of distinct groups / number of input rows) threshold 
for skipping partial aggregation. If the value is greater then partial 
aggregation will skip aggregation for further input
 datafusion.execution.skip_partial_aggregation_probe_rows_threshold 100000 
Number of input rows partial aggregation partition should process, before 
aggregation ratio check and trying to switch to skipping aggregation mode
-datafusion.execution.skip_physical_aggregate_schema_check false When set to 
true, skips verifying that the schema produced by planning the input of 
`LogicalPlan::Aggregate` exactly matches the schema of the input plan. When set 
to false, if the schema does not match exactly (including nullability and 
metadata), a planning error will be raised. This is used to workaround bugs in 
the planner that are now caught by the new schema verification step.
+datafusion.execution.skip_physical_aggregate_schema_check true When set to 
true, skips verifying that the schema produced by planning the input of 
`LogicalPlan::Aggregate` exactly matches the schema of the input plan. When set 
to false, if the schema does not match exactly (including nullability and 
metadata), a planning error will be raised. This is used to workaround bugs in 
the planner that are now caught by the new schema verification step.
 datafusion.execution.soft_max_rows_per_output_file 50000000 Target number of 
rows in output files when writing multiple. This is a soft max, so it can be 
exceeded slightly. There also will be one file smaller than the limit if the 
total number of rows written is not roughly divisible by the soft max
 datafusion.execution.sort_in_place_threshold_bytes 1048576 When sorting, below 
what size should data be concatenated and sorted in a single RecordBatch rather 
than sorted in batches and merged.
 datafusion.execution.sort_spill_reservation_bytes 10485760 Specifies the 
reserved memory for each spillable sort operation to facilitate an in-memory 
merge. When a sort operation spills to disk, the in-memory data must be sorted 
and merged before being written to a file. This setting reserves a specific 
amount of memory for that in-memory sort/merge process. Note: This setting is 
irrelevant if the sort operation cannot spill (i.e., if there's no 
`DiskManager` configured).
diff --git a/docs/source/user-guide/configs.md 
b/docs/source/user-guide/configs.md
index 3f5fc53f1c..b1661937be 100644
--- a/docs/source/user-guide/configs.md
+++ b/docs/source/user-guide/configs.md
@@ -81,7 +81,7 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.execution.parquet.maximum_parallel_row_group_writers         | 1  
                       | (writing) By default parallel parquet writer is tuned 
for minimum memory usage in a streaming execution plan. You may see a 
performance benefit when writing large parquet files by increasing 
maximum_parallel_row_group_writers and 
maximum_buffered_record_batches_per_stream if your system has idle cores and 
can tolerate additional memory usage. Boosting these values is likely 
worthwhile  [...]
 | datafusion.execution.parquet.maximum_buffered_record_batches_per_stream | 2  
                       | (writing) By default parallel parquet writer is tuned 
for minimum memory usage in a streaming execution plan. You may see a 
performance benefit when writing large parquet files by increasing 
maximum_parallel_row_group_writers and 
maximum_buffered_record_batches_per_stream if your system has idle cores and 
can tolerate additional memory usage. Boosting these values is likely 
worthwhile  [...]
 | datafusion.execution.planning_concurrency                               | 0  
                       | Fan-out during initial physical planning. This is 
mostly use to plan `UNION` children in parallel. Defaults to the number of CPU 
cores on the system                                                             
                                                                                
                                                                                
                    [...]
-| datafusion.execution.skip_physical_aggregate_schema_check               | 
false                     | When set to true, skips verifying that the schema 
produced by planning the input of `LogicalPlan::Aggregate` exactly matches the 
schema of the input plan. When set to false, if the schema does not match 
exactly (including nullability and metadata), a planning error will be raised. 
This is used to workaround bugs in the planner that are now caught by the new 
schema verification step.    [...]
+| datafusion.execution.skip_physical_aggregate_schema_check               | 
true                      | When set to true, skips verifying that the schema 
produced by planning the input of `LogicalPlan::Aggregate` exactly matches the 
schema of the input plan. When set to false, if the schema does not match 
exactly (including nullability and metadata), a planning error will be raised. 
This is used to workaround bugs in the planner that are now caught by the new 
schema verification step.    [...]
 | datafusion.execution.sort_spill_reservation_bytes                       | 
10485760                  | Specifies the reserved memory for each spillable 
sort operation to facilitate an in-memory merge. When a sort operation spills 
to disk, the in-memory data must be sorted and merged before being written to a 
file. This setting reserves a specific amount of memory for that in-memory 
sort/merge process. Note: This setting is irrelevant if the sort operation 
cannot spill (i.e., if there's  [...]
 | datafusion.execution.sort_in_place_threshold_bytes                      | 
1048576                   | When sorting, below what size should data be 
concatenated and sorted in a single RecordBatch rather than sorted in batches 
and merged.                                                                     
                                                                                
                                                                                
                          [...]
 | datafusion.execution.meta_fetch_concurrency                             | 32 
                       | Number of files to read in parallel when inferring 
schema and statistics                                                           
                                                                                
                                                                                
                                                                                
                  [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion) 01/03: chore: default=true for skip_physical_aggregate_schema_check, and add warn logging

Reply via email to