Shekharrajak opened a new issue, #3053:
URL: https://github.com/apache/datafusion-comet/issues/3053
### What is the problem the feature request solves?
Runtime Filters is a high-performance optimization that reduces I/O by
**90%** and improves query performance for join-heavy workloads. The feature
filters data at scan time using lightweight data structures constructed during
hash join build phases.
Runtime filters are lightweight data structures (IN sets, Min/Max bounds,
Bloom filters) built during hash join **build phases** and pushed down to
**scan operators** to filter data before reading from storage.
Example:
```sql
-- User writes standard SQL
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'USA';
```
1. Comet will detect join opportunity
2. Builds filter from `customers` table during join build
3. Applies filter to `orders` scan automatically
4. Query runs **faster** with **less I/O**
### Filter Types
1. **IN Filter** (Small cardinality <1000)
2. **Min/Max Filter** (Numeric/date types)
- Min and max bounds
3. **Bloom Filter** (Large cardinality, future)
- Probabilistic data structure
The system should automatically selects the optimal filter type:
- Numeric/date → Min/Max Filter (most efficient)
- Small cardinality → IN Filter
- Large cardinality → Bloom Filter
Users should be able to control runtime filters via Spark configuration:
```scala
// Enable/disable runtime filters
spark.conf.set("spark.comet.runtimeFilter.enabled", true)
// Adjust thresholds
spark.conf.set("spark.comet.runtimeFilter.inFilterThreshold", 1000)
spark.conf.set("spark.comet.runtimeFilter.bloomFilterFpp", 0.01)
```
### Describe the potential solution
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]