isidentical commented on PR #3912:
URL: 
https://github.com/apache/arrow-datafusion/pull/3912#issuecomment-1303915844

   Marking the PR draft until we reach a consensus on the API design. @alamb 
@Dandandan how do we feel about:
   - `fn boundaries(&self, context: &mut AnalysisContext) -> 
Option<ExprBoundaries>;`
   - `fn apply_boundaries(&self, context: &mut AnalysisContext, boundaries: 
&ExprBoundaries);`
   
   I know there are still some reservations on the context side, but I am out 
of ideas on how to make it simpler while preserving the same set of features 
(allowing dynamic column boundary updates from sub-expressions; being able to 
fork it at certain points (like ORs) and still able to delegate all this 
information back to the call side [so not only to sub-expressions but also the 
expression itself who is calling us]). I'd be happy to try any suggestions 
though.
   
   @mingmwang on the point of discrete intervals, yes it is a matter of 
histograms. Currently we would have a non-uniform distribution but would 
represent it as a uniform range (`a in (1, 2, 24, 25)` would be represented as 
`{min=1, max=25, distinct=4}` which would imply a distribution of `1, 9, 17, 
25` so not as precise). In the following phases, I hope to add a 
ValueDistribution construct to the `ExprBoundaries` so we would have a 
smaller-scoped histogram with all these ranges but we are not there yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to