lidavidm commented on PR #12891:
URL: https://github.com/apache/arrow/pull/12891#issuecomment-1103235600

   > This looks great, thanks for figuring this out. It seems there would be 
some advantage whenever I filter parquet files with an equality to add is_valid 
if that column might contain nulls. For example:
   > 
   > `(ds.field(x) < 10) & is_valid(ds.field(x))` will eliminate a row group 
with min 12 and null_count > 0 where `ds.field(x) < 10` will not (although the 
filtering will be very fast we will still have to decode the row group).
   > 
   > I don't know if this is worth documenting somewhere or if it is too 
obscure to include.
   
   Hmm. I guess we are treating guarantees and filters differently. `x < 10` as 
a guarantee implies `is_valid(x)`, but not as a filter. We may want to fix 
that, but that would also be a drastic change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to