isidentical commented on PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#issuecomment-1284600945
That is definitely an interesting point of view 👀 I was thinking more restricted towards what the filter's outcome would be (what sort of `a`'s can there be after we execute `a < 5`, hence `[0, 4]`), but I also see your point. I think it also highly relates to what an `ExprBoundaries` is (and what else we collect beside it, like the discussion in https://github.com/apache/arrow-datafusion/pull/3868#discussion_r999946718). > I would expect the output boundaries for a < 5 to be `ExprBoundaries {min: true, max:true}` My only worry is that a technically correct version should produce `ExprBoundaries {min: false, max:true}` since we don't know what `a` is. We know it can be `true` (since `min(a) < 5`) but we also know it can be `false` (`max(a) >= 5`). So not sure how useful that will be. But if we have some sort of a statistics context (and boundary aggregator), I think this might be definitely possible. As in, we would record what `a` could be in a different level than what that expression evaluates to. I think I understand it in general 👍🏻 (sorry for the confusing comment above, I was thinking something entirely different.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
