zhangyue-hashdata commented on code in PR #724:
URL: https://github.com/apache/cloudberry/pull/724#discussion_r2007339619
##########
src/backend/executor/nodeSeqscan.c:
##########
@@ -87,8 +101,17 @@ SeqNext(SeqScanState *node)
/*
* get the next tuple from the table
*/
- if (table_scan_getnextslot(scandesc, direction, slot))
+ while (table_scan_getnextslot(scandesc, direction, slot))
+ {
+ if (TupIsNull(slot))
+ return slot;
+
+ if (node->filter_in_seqscan && node->filters &&
+ !PassByBloomFilter(node, slot))
Review Comment:
> > It determines whether using a Bloom filter for filtering data would be
effective based on this evaluation
>
> That makes sense, but where is related code, I just didn't see them in
this pr. Does it compares the number of rows between the output of hashtable
and data in the probe table? If the rows of the hashtable are far less than
that of the probe table , then use the runtime filter?
bloom_create_aggresive() in src/backend/lib/bloomfilter.c
It creates a memory-efficient Bloom filter with the following
characteristics:
- Maximum size limited to 2MB
- Targets ~10% false positive rate
- Uses minimal memory while maintaining acceptable performance
- Returns NULL if too many elements would result in unacceptable false
positive rates
- Uses 2-3 hash functions depending on bits-per-element ratio
The function is an aggressive/optimized version of the regular bloom_create,
prioritizing memory efficiency over false positive rate precision.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]