Hi Arnaud,

those bloom filters are different from column Bloom filters as they are
> skip indices computed for a bunch of rows (configurable), what ClickHouse
> implementation calls a granule and I think that is equivalent a row group
> in Parquet.


It would probably be good to have clarity on exactly what is being
proposed.  Is this something you are interested in contributing?  It would
be good to review the contribution guidelines for new features [1].  Based
on the definitions linked I think this is enough limited scope that it
might be useful but limited enough in scope to be viable.

IMO, Given the slightly esoteric nature of what is being proposed it would
be nice to see this integrated with an open source query engine to
demonstrate its usefulness. Did you have one in mind? (Note I don't think
we should be aiming for Clickhouse compatibility here as parquet has
already defined its own bloom filter, but this would be cover in discussion
or a more detailed design).

We are also trying to write up more concrete guidance on how to move
forward in adding features this will likely be another e-mail thread on
this list.

Thanks,
Micah

[1]
https://github.com/apache/parquet-format/blob/master/CONTRIBUTING.md#additionschanges-to-the-format

Thanks,
Micah

On Mon, Apr 21, 2025 at 9:16 PM Arnaud Adant <aad...@jumptrading.com.invalid>
wrote:

> Hi guys,
>
> Gang Wu (@wgtmac<https://github.com/wgtmac>) suggested that I reach out
> to this list.
>
> https://github.com/apache/parquet-format/issues/489 (support for n-gram
> Bloom filters)
> https://github.com/apache/parquet-format/issues/490 (support for token
> Bloom filters)
>
> those bloom filters are different from column Bloom filters as they are
> skip indices computed for a bunch of rows (configurable), what ClickHouse
> implementation calls a granule and I think that is equivalent a row group
> in Parquet.
>
> Let me know if this makes sense.
>
> Best regards,
>
> Arnaud
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential, or privileged information
> and/or personal data. If you are not the intended recipient, you are hereby
> notified that any review, dissemination, or copying of this email is
> strictly prohibited, and requested to notify the sender immediately and
> destroy this email and any attachments. Email transmission cannot be
> guaranteed to be secure or error-free. The Company, therefore, does not
> make any guarantees as to the completeness or accuracy of this email or any
> attachments. This email is for informational purposes only and does not
> constitute a recommendation, offer, request, or solicitation of any kind to
> buy, sell, subscribe, redeem, or perform any type of transaction of a
> financial product. Personal data, as defined by applicable data protection
> and privacy laws, contained in this email may be processed by the Company,
> and any of its affiliated or related companies, for legal, compliance,
> and/or business-related purposes. You may have rights regarding your
> personal data; for information on exercising these rights or the Company's
> treatment of personal data, please email datareque...@jumptrading.com.
>

Reply via email to