xudong963 opened a new issue, #15290:
URL: https://github.com/apache/datafusion/issues/15290

   ### Is your feature request related to a problem or challenge?
   
   I'm working on the ticket: https://github.com/apache/datafusion/issues/10316.
   
   Given that, we'll replace all `Precision` with `Distribution`: 
https://github.com/synnada-ai/datafusion-upstream/pull/63. So, while I make the 
design for #10316, I presumably use `Distribution` in statistics.
   
   There is a spot where I'll do the `merge` for statistics, and it'll be 
spread to the `Distribution`.
   
   The specific case is that I need to compute the partition-level statistics, 
aka, files will be grouped as the filegroup, each file group will be treated as 
a partition, and different partitions will be processed in parallel. So, the 
partition-level statistics will be from the merge of the files in a filegroup.
   
   ### Describe the solution you'd like
   
   Create a function that combines their statistical properties into a new 
distribution. The most appropriate approach is to create a GenericDistribution 
that approximates the mixture of the two input distributions.
   
   ```rust
   pub fn merge_distributions(a: &Distribution, b: &Distribution) -> 
Result<Distribution> {
       ...
   }
   ```
   ---
   
   I'll open a PR and we can do more discussions based on the PR.
   
   ### Describe alternatives you've considered
   
   No
   
   ### Additional context
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to