Copilot commented on code in PR #19932:
URL: https://github.com/apache/datafusion/pull/19932#discussion_r2714134861


##########
datafusion/common/src/config.rs:
##########
@@ -1115,6 +1115,18 @@ config_namespace! {
         /// See: 
<https://trino.io/docs/current/admin/dynamic-filtering.html#dynamic-filter-collection-thresholds>
         pub hash_join_inlist_pushdown_max_distinct_values: usize, default = 150
 
+        /// When true, pushes down hash table references for membership checks 
in hash joins
+        /// when the build side is too large for InList pushdown.
+        /// When false, no membership filter is created when InList thresholds 
are exceeded.
+        /// Default: true
+        pub hash_join_map_pushdown: bool, default = true
+
+        /// When true, pushes down min/max bounds for join key columns.
+        /// This enables statistics-based pruning (e.g., Parquet row group 
skipping).
+        /// When false, only membership filters (InList or Map) are pushed 
down.
+        /// Default: true
+        pub hash_join_bounds_pushdown: bool, default = true

Review Comment:
   The new configuration options `hash_join_map_pushdown` and 
`hash_join_bounds_pushdown` lack test coverage. While the PR description 
mentions existing hash join tests pass, there are no specific tests that 
validate the behavior when these options are set to false. Consider adding 
tests similar to the existing `test_hashjoin_hash_table_pushdown_partitioned` 
that verify:
   1. When `hash_join_map_pushdown` is false and build side exceeds InList 
thresholds, no membership filter is created (only bounds if enabled)
   2. When `hash_join_bounds_pushdown` is false, bounds predicates are not 
included in the filter
   3. When both are false but dynamic filter pushdown is enabled, appropriate 
behavior occurs
   4. Various combinations of these flags work correctly in both Partitioned 
and CollectLeft modes



##########
datafusion/common/src/config.rs:
##########
@@ -1115,6 +1115,18 @@ config_namespace! {
         /// See: 
<https://trino.io/docs/current/admin/dynamic-filtering.html#dynamic-filter-collection-thresholds>
         pub hash_join_inlist_pushdown_max_distinct_values: usize, default = 150
 
+        /// When true, pushes down hash table references for membership checks 
in hash joins
+        /// when the build side is too large for InList pushdown.
+        /// When false, no membership filter is created when InList thresholds 
are exceeded.

Review Comment:
   The documentation states "When false, no membership filter is created when 
InList thresholds are exceeded." This could be clearer. Consider rephrasing to: 
"When false, no hash table reference (Map) is used for membership checks. If 
the build side exceeds InList thresholds, no membership filter is created (only 
bounds, if enabled)." This makes it clearer that InList filters can still be 
created when the build side is small enough, and that this config specifically 
controls the Map fallback behavior.
   ```suggestion
           /// When true, pushes down hash table references (Map) for 
membership checks in hash joins
           /// when the build side is too large for InList pushdown.
           /// When false, no hash table reference (Map) is used for membership 
checks. If the build
           /// side exceeds InList thresholds, no membership filter is created 
(only bounds, if enabled).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to