[GitHub] [arrow-datafusion] isidentical commented on pull request #3868: Implement foundational filter selectivity analysis

GitBox Wed, 19 Oct 2022 14:38:01 -0700


isidentical commented on PR #3868:
URL: 
https://github.com/apache/arrow-datafusion/pull/3868#issuecomment-1284600945


   That is definitely an interesting point of view 👀 I was thinking more 
restricted towards what the filter's outcome would be (what sort of `a`'s can 
there be after we execute `a < 5`, hence `[0, 4]`), but I also see your point. 
I think it also highly relates to what an `ExprBoundaries` is (and what else we 
collect beside it, like the discussion in 
https://github.com/apache/arrow-datafusion/pull/3868#discussion_r999946718).
   
   > I would expect the output boundaries for a < 5 to be `ExprBoundaries  
{min: true, max:true}`
   
   My only worry is that a technically correct version should produce 
`ExprBoundaries  {min: false, max:true}` since we don't know what `a` is. We 
know it can be `true` (since `min(a) < 5`) but we also know it can be `false` 
(`max(a) >= 5`). So not sure how useful that will be. But if we have some sort 
of a statistics context (and boundary aggregator), I think this might be 
definitely possible.
   
   As in, we would record what `a` could be in a different level than what that 
expression evaluates to. I think I understand it in general 👍🏻 (sorry for the 
confusing comment above, I was thinking something entirely different.)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] isidentical commented on pull request #3868: Implement foundational filter selectivity analysis

Reply via email to