alamb opened a new pull request, #13293: URL: https://github.com/apache/datafusion/pull/13293
## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/8078 - Related to https://github.com/apache/datafusion/issues/10316 # Discussion: This is a Request for Comment (and maybe also a POC) I am going to hack out another PR that takes a slightly different approach ( `Precision::Interval`) and see how much better/worse that is. ## Rationale for this change For the analysis described on https://github.com/apache/datafusion/issues/10316, we need to know if a value is constrained to a range to avoid merging. However the current `Statistics` are either `Exact` or `Inexact`, so once the precision becomes Inexact we lose the information that the possible minimum / maximum values do not change. This came up twice for me recently: 1. @suremarc, @findepi and I spoke about this here: https://github.com/apache/datafusion/issues/8227#issuecomment-2458134570 where we need to know if a value is constrained to a range to avoid merging (see https://github.com/apache/datafusion/issues/10316) 2. Yesterday I happened to be working on code in InfluxDB 3.0 that relies on knowing min/max values and I hit https://github.com/influxdata/arrow-datafusion/commit/30d4368127db80c125a1adea22a955a496a516df which marks the statistics as "inexact" when filter pushdown is on for parquet, but that loses the key information that the possible minimum / maximum values do not change I hacked around it downstream, but I think this is all sounding like it is time to add a new `Precision` enum that allows for this usecase ## What changes are included in this PR? - [x] Introduce `Precision::AtMost`, `Precision::AtLeast` - [x] Add `ColumnStatistics::min` and `ColumnStatistics::max` that return the correct value - [ ] Update code to handle new precision - [ ] Update ParquetExec to use the new Precision variants when filters are on ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
