parisni commented on PR #8716: URL: https://github.com/apache/hudi/pull/8716#issuecomment-1564496057
No AFAIK both iceberg and delta use bloom in the same way the current PR does: read time parquet push down. Just to mention spark also provides a [join based on bloom](https://github.com/apache/spark/pull/35789) since 3.3 but in this cas it is a on the fly built bloom, and not the parquet bloom filter. On May 26, 2023 5:19:25 AM UTC, Danny Chan ***@***.***> wrote: >Thanks for the sharing, I think the Databricks BloomFilter index mainly serves as query optimization purposes right? Do they also use this to accelate the data skipping during data ingestion, aka the UPSERTS ? > > > >-- >Reply to this email directly or view it on GitHub: >https://github.com/apache/hudi/pull/8716#issuecomment-1563823699 >You are receiving this because you authored the thread. > >Message ID: ***@***.***> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
