westonpace commented on issue #33683:
URL: https://github.com/apache/arrow/issues/33683#issuecomment-1545920015
> Maybe it’s an overkill but would using the filter subset of substrait work?
That probably is overkill though it would work if someone had a desire. I
believe bloom filters are only useful for equality / inequality. The
statistics support comparison. So you probably just need =,!=,<,>,<=,>=. The
simplest thing to do might be to do what we used to do for the old python
datasets and accept disjunctive normal form:
> Predicates are expressed using an Expression or using the disjunctive
normal form (DNF), like [[('x', '=', 0), ...], ...]. DNF allows arbitrary
boolean logical combinations of single column predicates. The innermost tuples
each describe a single column predicate. The list of inner predicates is
interpreted as a conjunction (AND), forming a more selective and multiple
column predicate. Finally, the most outer list combines these filters as a
disjunction (OR).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]