[
https://issues.apache.org/jira/browse/ARROW-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288556#comment-17288556
]
Ben Kietzman commented on ARROW-11384:
--------------------------------------
The first step here would be providing a compute function which tests inputs
against a Bloom filter. This function could then be referenced by (for example)
the expressions extracted from row group statistics. Finally, a special case
would be added to expression simplification to test if a filter could be
satisfied given a bloom filter. For example:
{code}
SimplifyGivenGuarantee(equal(field_ref("a"), literal(1)),
bloom_filter(field_ref("a"), ...)))
{code}
would either return {{literal(false)}} to indicate that the filter is
unsatisfiable or pass through {{equal(field_ref("a"), literal(1))}} to indicate
that the Bloom filter does not exclude the value 1.
> [C++][Dataset] Support bloom filters in predicate pushdown
> ----------------------------------------------------------
>
> Key: ARROW-11384
> URL: https://issues.apache.org/jira/browse/ARROW-11384
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Ben Kietzman
> Priority: Major
> Labels: dataset, parquet
>
> The parquet spec includes bloom filters which can be useful during
> filtration. In the context of dataset::, this would be expressed as
> additional parquet statistics expressions on each row group, allowing
> entirely-excluded row groups to be skipped more aggressively.
> Prerequisite: https://issues.apache.org/jira/browse/PARQUET-1327
> (reader/writer support for bloom filters)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)