berkaysynnada commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2672689718
> so the min/max/ndv of ColumnStatistic will be removed, the ColumnStatistics will like this: pub struct ColumnStatistics{stat: StatisticsV2}, right? nope, it will seem like ``` pub struct ColumnStatistics { /// Number of null values on column pub null_count: StatisticsV2<usize>, /// Maximum value of column pub max_value: StatisticsV2 <ScalarValue>, /// Minimum value of column pub min_value: StatisticsV2 <ScalarValue>, /// Sum value of a column pub sum_value: StatisticsV2 <ScalarValue>, /// Number of distinct values pub distinct_count: StatisticsV2<usize>, } ``` > Have we done any work on the accuracy of the new statistical information during cardinality estimation? cardinality is a term related with intervals, and we have a function already for cardinality calculations as a method of `Interval` struct. > Are there certain papers that describe this statistical information framework in more detail? This framework provides distributions of Uniform, Exponential, Gaussian, Bernoulli, and Unknown. The first four variants represent well-known probability distributions, while the Unknown variant serves as a fallback option where the exact distribution type is unspecified. However, key statistical parameters such as mean, median, variance, and range can still be provided there (as these parameters are already meaningful for optimization and decision-making processes) If you require specific details about these distribution types or their parameters, you can refer to the links provided in the docstrings. Additionally, if you're interested in further exploring their interactions -PDF computations- I can suggest Wolfram Mathematica. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org