msirek opened a new pull request, #8049:
URL: https://github.com/apache/arrow-datafusion/pull/8049

   ## Which issue does this PR close?
   
   Closes #8048.
   
   ## Rationale for this change
   
   While testing #8038, I ran into some incorrect results cases in `COUNT(*)` 
queries
   from a `LIMIT`ed relation, related to [exact 
statistics](https://github.com/apache/arrow-datafusion/pull/7793).
   
   The issue is due to use of the `fetch` value plus the `skip` value in
   stats for `GlobalLimitExec` as the output stats `num_rows`, plus use of 
`Exact`
   statistics for the output in cases where the input has `Inexact` statistics.
   
   ## What changes are included in this PR?
   
   #### Fix incorrect results in COUNT(*) queries with LIMIT
   
   This commit reworks the cases in `GlobalLimitExec::statistics` to cap output
   stats `num_rows` at the `fetch` value instead of the `fetch+skip` value.
   
   Also, the following cases are modified:
   
   - Output stats are copied from input stats when # of input rows is less than 
fetch rows, and `skip` is 0
   - if (# of input rows - skip) <= fetch, output `num_rows` = input `num_rows` 
- `skip`
   - if input stats are `Inexact` or `Absent`, output stats are `Inexact`
   - if (# of input rows - skip) > usize::MAX and `fetch` value is `None`, 
output stats are `Inexact`
   
   ## Are these changes tested?
   
   - [x] unit tests for `GlobalLimitExec` statistics, both `Exact` and 
`Inexact`.
   - [x] sqllogictests for `GlobalLimitExec` statistics
   
   ## Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to