zhangyue-hashdata commented on PR #724: URL: https://github.com/apache/cloudberry/pull/724#issuecomment-2517322395
> Looks interesting. And I have some questions to discuss. > > * Beside the seqscan, can the runtime filter apply to other types of scan? such as the index scan. > * Looks only when the `hashjoin` node and `seqscan` node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated. * Beside the seqscan, can the runtime filter apply to other types of scan? such as the index scan. > Theoretically, it is feasible to apply runtime filters to operators such as Index Scan. However, because Index Scan already reduces data volume by leveraging an optimized storage structure, the performance gains from applying runtime filters to Index Scan would likely be minimal. Thus, I think that applying runtime filters to Index Scan would not yield significant performance benefits. > In subsequent work, when we discover that other scan operators can achieve notable performance improvements from pushdown runtime filters, we will support these operators. Our focus will be on operators where runtime filters can substantially decrease the amount of data processed early in the query execution, leading to more pronounced performance enhancements. * Looks only when the `hashjoin` node and `seqscan` node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated. > Yes, the current pushdown runtime filter only supports in-process pushdown, which means that the Hash Join and SeqScan need to be within the same process. The design and implementation of cross-process pushdown runtime filters are much more complex. > This limitation arises because coordinating and sharing data structures like Bloom filters or other runtime filters across different processes involves additional challenges such as inter-process communication (IPC), synchronization, and ensuring consistency and efficiency of the filters across process boundaries. Addressing these issues requires a more sophisticated design that can handle the complexities of distributed computing environments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
