Re: [PR] Push the runtime filter from HashJoin down to SeqScan or AM. [cloudberry]

via GitHub Sun, 23 Mar 2025 21:12:36 -0700


zhangyue-hashdata commented on code in PR #724:
URL: https://github.com/apache/cloudberry/pull/724#discussion_r2009429137



##########
src/backend/executor/nodeSeqscan.c:
##########
@@ -87,8 +100,22 @@ SeqNext(SeqScanState *node)
        /*
         * get the next tuple from the table
         */
-       if (table_scan_getnextslot(scandesc, direction, slot))
-               return slot;
+       if (node->filter_in_seqscan && node->filters)
+       {
+               while (table_scan_getnextslot(scandesc, direction, slot))
+               {
+                       if (!PassByBloomFilter(node, slot))

Review Comment:
   > but bloom_create_aggresive maybe return NULL, buffer limit 1M ~2MB
   
   There are two factors that will ensure the Bloom filter is created 
successfully with a high probability. The first is that Bloom filters are 
typically built on small tables. Even for tables as large as 1TB or 10TB, the 
data volume of TPC-DS small tables is generally not very large. The second 
factor is the number of segments; the average amount of data from small tables 
distributed across each segment will also not be excessive. Considering these 
two factors together, for a data volume of 10TB, it is highly likely that the 
creation of the Bloom filter will not fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Push the runtime filter from HashJoin down to SeqScan or AM. [cloudberry]

Reply via email to