isidentical opened a new issue, #3813: URL: https://github.com/apache/arrow-datafusion/issues/3813
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `distinct_count` is usually expensive to compute, so some platforms which save parquet files abstain from injecting it at the metadata section. We should be able to estimate the join cardinality without it before falling back to cartesian product. **Describe the solution you'd like** Since we already require min/max values to be present, we should be able to just do `min(num_left_rows - num_nulls or 0, scalar_range(left_stats.min, left_stats.max))` to determine an alternative distinct count. **Describe alternatives you've considered** None. **Additional context** Original discussion was here https://github.com/apache/arrow-datafusion/pull/3787#discussion_r992751749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
