richardstartin commented on issue #8837:
URL: https://github.com/apache/pinot/issues/8837#issuecomment-1149520182
> > All 10k records will be inserted into the PQ, regardless of whether the
insertion results in the removal of another or not. 9990 records will be
removed from the PQ. This is clearly wasteful.
>
> Not sure if we are referring to the same thing. In
`SelectionOperatorUtils.addToPriorityQueue()`, we insert the value into the
query only when it is larger than the top value. The PQ size will be up to 10
all the time
I think the communication gap here is because you may have missed that the
scope if this issue is strictly _descending order over sorted columns_ so you
have jumped to assuming I don't understand the algorithm.
_When the values are sorted and the comparator is descending_, the next
value is always greater than the smallest element in the PQ. This means the
next value always added, and the smallest value is removed after the limit is
reached. E.g. imagine the limit is 2, this table shows the state of the PQ for
each docId.
| docId | Value | PQ |
| ------ | ------ | ----------------------- |
| 0 | 10 | [10] |
| 1 | 15 | [10, 15] |
| 2 | 30 | [15, 30] |
| 3 | 400 | [30, 400] |
The PQ never grows beyond the size of 2, but that claim hasn't actually been
made. After the first two values, for descending order over a sorted column,
the second condition below is always true. Every value is added to the PQ once,
and all but the last 2 values are eventually removed from the PQ.
```java
public static <T> void addToPriorityQueue(T value, PriorityQueue<T> queue,
int maxNumValues) {
if (queue.size() < maxNumValues) {
queue.add(value);
} else if (queue.comparator().compare(queue.peek(), value) < 0) {
queue.poll();
queue.offer(value);
}
}
```
That is, for each block, the algorithm _"inserts up to 10k elements into a
priority queue to discard all but the last 10"_.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]