msirek opened a new pull request, #8049: URL: https://github.com/apache/arrow-datafusion/pull/8049
## Which issue does this PR close? Closes #8048. ## Rationale for this change While testing #8038, I ran into some incorrect results cases in `COUNT(*)` queries from a `LIMIT`ed relation, related to [exact statistics](https://github.com/apache/arrow-datafusion/pull/7793). The issue is due to use of the `fetch` value plus the `skip` value in stats for `GlobalLimitExec` as the output stats `num_rows`, plus use of `Exact` statistics for the output in cases where the input has `Inexact` statistics. ## What changes are included in this PR? #### Fix incorrect results in COUNT(*) queries with LIMIT This commit reworks the cases in `GlobalLimitExec::statistics` to cap output stats `num_rows` at the `fetch` value instead of the `fetch+skip` value. Also, the following cases are modified: - Output stats are copied from input stats when # of input rows is less than fetch rows, and `skip` is 0 - if (# of input rows - skip) <= fetch, output `num_rows` = input `num_rows` - `skip` - if input stats are `Inexact` or `Absent`, output stats are `Inexact` - if (# of input rows - skip) > usize::MAX and `fetch` value is `None`, output stats are `Inexact` ## Are these changes tested? - [x] unit tests for `GlobalLimitExec` statistics, both `Exact` and `Inexact`. - [x] sqllogictests for `GlobalLimitExec` statistics ## Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
