adriangb opened a new pull request, #18439:
URL: https://github.com/apache/datafusion/pull/18439

   ## Summary
   
   Completes the dynamic filter pushdown feature by implementing hash table 
reference pushdown for hash joins when build side exceeds the InList threshold 
(default 128KB). For large build sides, sharing hash table references is more 
efficient than materializing values into InList expressions.
   
   ## Changes
   
   - Merged PR4 (hash join refactor), PR5 (hash expressions), PR6 (InList 
pushdown)
   - Implemented `PushdownStrategy::HashTable` using `HashTableLookupExpr`
   - Shared hash table Arc from build side to probe side filters (zero-copy)
   - Ensured consistent hashing via `random_state` sharing
   - Added comprehensive test for hash table lookup strategy
   - Updated all test snapshots with correct InList and hash_lookup expectations
   
   ## Strategy Switching
   
   The implementation automatically selects the optimal strategy:
   
   - **Small build side** (≤128KB): `PushdownStrategy::InList(array)` - 
efficient InList with statistics pruning
   - **Large build side** (>128KB): `PushdownStrategy::HashTable(hash_map)` - 
shared hash table reference
   - **Empty partition**: `PushdownStrategy::Empty` - no filter
   
   ## Testing
   
   - ✅ All clippy checks pass with `-D warnings`
   - ✅ All 22 filter pushdown tests passing
   - ✅ Updated 5 test snapshots with correct expectations
   - ✅ New test validates hash table lookup for large build sides
   
   ## Part of Multi-PR Strategy
   
   This is **PR7 of 7** in the InList pushdown feature breakdown - the final 
feature PR!
   
   **Tier**: Features (Tier 3)
   **Dependencies**: PR4 (hash join refactor), PR5 (hash expressions), PR6 
(InList pushdown)
   **Can merge**: After dependencies merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to