isidentical commented on PR #3912: URL: https://github.com/apache/arrow-datafusion/pull/3912#issuecomment-1303915844
Marking the PR draft until we reach a consensus on the API design. @alamb @Dandandan how do we feel about: - `fn boundaries(&self, context: &mut AnalysisContext) -> Option<ExprBoundaries>;` - `fn apply_boundaries(&self, context: &mut AnalysisContext, boundaries: &ExprBoundaries);` I know there are still some reservations on the context side, but I am out of ideas on how to make it simpler while preserving the same set of features (allowing dynamic column boundary updates from sub-expressions; being able to fork it at certain points (like ORs) and still able to delegate all this information back to the call side [so not only to sub-expressions but also the expression itself who is calling us]). I'd be happy to try any suggestions though. @mingmwang on the point of discrete intervals, yes it is a matter of histograms. Currently we would have a non-uniform distribution but would represent it as a uniform range (`a in (1, 2, 24, 25)` would be represented as `{min=1, max=25, distinct=4}` which would imply a distribution of `1, 9, 17, 25` so not as precise). In the following phases, I hope to add a ValueDistribution construct to the `ExprBoundaries` so we would have a smaller-scoped histogram with all these ranges but we are not there yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org