lidavidm commented on PR #12891: URL: https://github.com/apache/arrow/pull/12891#issuecomment-1103235600
> This looks great, thanks for figuring this out. It seems there would be some advantage whenever I filter parquet files with an equality to add is_valid if that column might contain nulls. For example: > > `(ds.field(x) < 10) & is_valid(ds.field(x))` will eliminate a row group with min 12 and null_count > 0 where `ds.field(x) < 10` will not (although the filtering will be very fast we will still have to decode the row group). > > I don't know if this is worth documenting somewhere or if it is too obscure to include. Hmm. I guess we are treating guarantees and filters differently. `x < 10` as a guarantee implies `is_valid(x)`, but not as a filter. We may want to fix that, but that would also be a drastic change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org