berkaysynnada commented on PR #14699:
URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2672689718
> so the min/max/ndv of ColumnStatistic will be removed, the
ColumnStatistics will like this: pub struct ColumnStatistics{stat:
StatisticsV2}, right?
nope, it will seem like
```
pub struct ColumnStatistics {
/// Number of null values on column
pub null_count: StatisticsV2<usize>,
/// Maximum value of column
pub max_value: StatisticsV2 <ScalarValue>,
/// Minimum value of column
pub min_value: StatisticsV2 <ScalarValue>,
/// Sum value of a column
pub sum_value: StatisticsV2 <ScalarValue>,
/// Number of distinct values
pub distinct_count: StatisticsV2<usize>,
}
```
> Have we done any work on the accuracy of the new statistical information
during cardinality estimation?
cardinality is a term related with intervals, and we have a function already
for cardinality calculations as a method of `Interval` struct.
> Are there certain papers that describe this statistical information
framework in more detail?
This framework provides distributions of Uniform, Exponential, Gaussian,
Bernoulli, and Unknown. The first four variants represent well-known
probability distributions, while the Unknown variant serves as a fallback
option where the exact distribution type is unspecified. However, key
statistical parameters such as mean, median, variance, and range can still be
provided there (as these parameters are already meaningful for optimization and
decision-making processes)
If you require specific details about these distribution types or their
parameters, you can refer to the links provided in the docstrings.
Additionally, if you're interested in further exploring their interactions -PDF
computations- I can suggest Wolfram Mathematica.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]