zclllyybb commented on issue #64172: URL: https://github.com/apache/doris/issues/64172#issuecomment-4640858451
Breakwater-GitHub-Analysis-Slot: slot_84437a4c8059 Initial triage: this looks like a valid correctness bug in the partition top-n path for `rank()` / `dense_rank()`, not a data-layout issue. I checked current upstream `master` (`c27fd0ba968daffe8329186205315f8b1cd147b7`) and the reported control flow is still present: - `CreatePartitionTopNFromWindow` can push filters on `row_number()`, `rank()`, and `dense_rank()` into `LogicalPartitionTopN`. - `PartitionSortNode` maps `rank()` to `TopNAlgorithm::RANK` and `dense_rank()` to `TopNAlgorithm::DENSE_RANK`. - `PartitionSorter::_read_row_rank()` uses `_get_enough_data()` in two different meanings: "the rank/top-n limit has been reached" and "this sorter is fully exhausted". - For tied rows, those meanings diverge. With `WHERE rk = 1`, `_get_enough_data()` can become true after the first emitted row of the first rank group, while the same rank group still has rows left in the queue. - If the current output block reaches `batch_size` before the tie group is fully drained, the deferred EOS assignment sets `*eos = true`. `PartitionSortSourceOperatorX::get_sorted_block()` then advances `_sort_idx`, so the remaining rows in the current sorter are skipped permanently. This also explains why the row count follows `min(batch_size, tied_group_size)` with one pipeline instance, and why disabling `enable_partition_topn` avoids the wrong result by routing through the normal analytic path. `row_number()` is not affected in the same way because stopping after the configured number of rows is its intended semantics. The proposed fix direction is reasonable: `_read_row_rank()` needs a separate "finished" state, and EOS should only be reported when the input queue is exhausted or when the code has crossed into the next distinct rank group after the rank/top-n limit has already been satisfied. Reaching `_get_enough_data()` alone should not mark the current sorter as drained. Recommended next steps: - Treat this as a silent wrong-result bug with high priority. - Add coverage for both `dense_rank()` and `rank()` where the first tied rank group is larger than `batch_size`. - Include a control case with `enable_partition_topn = false` and at least one case with `batch_size` smaller than the tied group. - Please also confirm the affected release branches when preparing backports; the inspected upstream path matches the report, but I did not run the SQL repro locally in this triage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
