Re: [PR] Push the runtime filter from HashJoin down to SeqScan or AM. [cloudberry]

via GitHub Wed, 04 Dec 2024 05:57:43 -0800


zhangyue-hashdata commented on PR #724:
URL: https://github.com/apache/cloudberry/pull/724#issuecomment-2517322395


   > Looks interesting. And I have some questions to discuss.
   > 
   > * Beside the seqscan, can the runtime filter apply to other types of scan? 
such as the index scan.
   > * Looks only when the `hashjoin` node and `seqscan` node run in the same 
process can use the runtime filter. Which means the tables should have same 
distributed policy on the join columns or one of the table is replicated.
   
   * Beside the seqscan, can the runtime filter apply to other types of scan? 
such as the index scan.
   > Theoretically, it is feasible to apply runtime filters to operators such 
as Index Scan. However, because Index Scan already reduces data volume by 
leveraging an optimized storage structure, the performance gains from applying 
runtime filters to Index Scan would likely be minimal. Thus, I think that 
applying runtime filters to Index Scan would not yield significant performance 
benefits.
   
   > In subsequent work, when we discover that other scan operators can achieve 
notable performance improvements from pushdown runtime filters, we will support 
these operators. Our focus will be on operators where runtime filters can 
substantially decrease the amount of data processed early in the query 
execution, leading to more pronounced performance enhancements.
   
   * Looks only when the `hashjoin` node and `seqscan` node run in the same 
process can use the runtime filter. Which means the tables should have same 
distributed policy on the join columns or one of the table is replicated.
   > Yes, the current pushdown runtime filter only supports in-process 
pushdown, which means that the Hash Join and SeqScan need to be within the same 
process. The design and implementation of cross-process pushdown runtime 
filters are much more complex.
   
   > This limitation arises because coordinating and sharing data structures 
like Bloom filters or other runtime filters across different processes involves 
additional challenges such as inter-process communication (IPC), 
synchronization, and ensuring consistency and efficiency of the filters across 
process boundaries. Addressing these issues requires a more sophisticated 
design that can handle the complexities of distributed computing environments.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Push the runtime filter from HashJoin down to SeqScan or AM. [cloudberry]

Reply via email to