berkaysynnada commented on PR #14699:
URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2672689718

   >  so the min/max/ndv of ColumnStatistic will be removed, the 
ColumnStatistics will like this: pub struct ColumnStatistics{stat: 
StatisticsV2}, right? 
   
   nope, it will seem like
   ```
   pub struct ColumnStatistics {
       /// Number of null values on column
       pub null_count: StatisticsV2<usize>,
       /// Maximum value of column
       pub max_value: StatisticsV2 <ScalarValue>,
       /// Minimum value of column
       pub min_value: StatisticsV2 <ScalarValue>,
       /// Sum value of a column
       pub sum_value: StatisticsV2 <ScalarValue>,
       /// Number of distinct values
       pub distinct_count: StatisticsV2<usize>,
   }
   ```
   
   > Have we done any work on the accuracy of the new statistical information 
during cardinality estimation?
   
   cardinality is a term related with intervals, and we have a function already 
for cardinality calculations as a method of `Interval` struct.
   
   > Are there certain papers that describe this statistical information 
framework in more detail?
   
   This framework provides distributions of Uniform, Exponential, Gaussian, 
Bernoulli, and Unknown. The first four variants represent well-known 
probability distributions, while the Unknown variant serves as a fallback 
option where the exact distribution type is unspecified. However, key 
statistical parameters such as mean, median, variance, and range can still be 
provided there (as these parameters are already meaningful for optimization and 
decision-making processes)
   
   If you require specific details about these distribution types or their 
parameters, you can refer to the links provided in the docstrings. 
Additionally, if you're interested in further exploring their interactions -PDF 
computations- I can suggest Wolfram Mathematica.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to