924060929 opened a new pull request, #64306:
URL: https://github.com/apache/doris/pull/64306
### What problem does this PR solve?
Problem Summary:
`MergeTopNs` (the `MERGE_TOP_N` rewrite rule) merges a parent `TopN` into
its child `TopN`
when their order keys are compatible. When the parent `TopN` carries a
non-zero `OFFSET`, the
merged limit was computed as `min(parent.limit, child.limit)`, which ignores
that the parent
offset consumes rows from the child's output. The merged `TopN` therefore
keeps too many rows
and the query returns a wrong result.
Example:
```sql
SELECT * FROM (SELECT k, v FROM t ORDER BY k LIMIT 5) s ORDER BY k LIMIT 3
OFFSET 4;
```
The inner `ORDER BY k LIMIT 5` yields 5 rows; the outer `LIMIT 3 OFFSET 4`
skips 4 of them, so
only 1 row should remain. Before this PR the rule merged the two `TopN` into
`OFFSET 4 LIMIT 3`
(instead of `OFFSET 4 LIMIT 1`), so it returned 3 rows.
Fix: clamp the merged limit by `max(child.limit - parent.offset, 0)`, the
same semantics
already used by `MergeLimits.mergeLimit` for consecutive limits. The bug
only triggers when the
outer `TopN` has a non-zero offset (offset = 0 makes both formulas equal).
The existing unit test `MergeTopNsTest.testOffset` asserted the buggy value
(`limit == 10`,
while the correct value is `9`); this PR corrects that assertion as well.
### Release note
Fix the wrong result produced by the `MERGE_TOP_N` optimization when an outer
`ORDER BY ... LIMIT` carries a non-zero `OFFSET` over an inner `ORDER BY ...
LIMIT`.
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- Behavior changed:
- [ ] No.
- [x] Yes. A stacked `TopN` whose outer `TopN` has a non-zero offset now
returns the correct (fewer) rows instead of extra rows.
- Does this need documentation?
- [x] No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]