Re: [PR] Push the runtime filter from HashJoin down to SeqScan or AM. [cloudberry]

via GitHub Fri, 21 Mar 2025 03:59:43 -0700


zhangyue-hashdata commented on code in PR #724:
URL: https://github.com/apache/cloudberry/pull/724#discussion_r2007339619



##########
src/backend/executor/nodeSeqscan.c:
##########
@@ -87,8 +101,17 @@ SeqNext(SeqScanState *node)
        /*
         * get the next tuple from the table
         */
-       if (table_scan_getnextslot(scandesc, direction, slot))
+       while (table_scan_getnextslot(scandesc, direction, slot))
+       {
+               if (TupIsNull(slot))
+                       return slot;
+
+               if (node->filter_in_seqscan && node->filters &&
+                       !PassByBloomFilter(node, slot))

Review Comment:
   > > It determines whether using a Bloom filter for filtering data would be 
effective based on this evaluation
   > 
   > That makes sense, but where is related code, I just didn't see them in 
this pr. Does it compares the number of rows between the output of hashtable 
and data in the probe table? If the rows of the hashtable are far less than 
that of the probe table , then use the runtime filter?
   
   bloom_create_aggresive() in src/backend/lib/bloomfilter.c 
   
   It creates a memory-efficient Bloom filter with the following 
characteristics:
   - Maximum size limited to 2MB
   - Targets ~10% false positive rate
   - Uses minimal memory while maintaining acceptable performance
   - Returns NULL if too many elements would result in unacceptable false 
positive rates
   - Uses 2-3 hash functions depending on bits-per-element ratio
   
   The function is an aggressive/optimized version of the regular bloom_create, 
prioritizing memory efficiency over false positive rate precision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Push the runtime filter from HashJoin down to SeqScan or AM. [cloudberry]

Reply via email to