richardstartin opened a new issue #7980:
URL: https://github.com/apache/pinot/issues/7980
`InIdSetTransfomFunction.transformToIntValuesSV` works by expanding a set
into an inflated bitmap using 32 bits per docId by checking if each dictId is
in the `IdSet` or not:
```java
case INT:
int[] intValues =
_transformFunction.transformToIntValuesSV(projectionBlock);
for (int i = 0; i < length; i++) {
_results[i] = _idSet.contains(intValues[i]) ? 1 : 0;
}
break;
```
For `INT` columns, the `IdSet` is backed by a `RoaringBitmap` and `contains`
is a log(n) operation. This translates to a lot of time spent locating
`Container`s in large `IdSet`s:
<img width="840" alt="Screenshot 2022-01-07 at 09 07 56"
src="https://user-images.githubusercontent.com/16439049/148520146-fa404d9c-0090-489f-9f55-fb8dab15e115.png">
<img width="1539" alt="Screenshot 2022-01-07 at 09 07 37"
src="https://user-images.githubusercontent.com/16439049/148520116-beceed90-eb9e-4ed3-b64d-c3c511886df7.png">
This could be made more efficient by constructing a `RoaringBitmap` from
`intValues` and intersecting it with the `IdSet`, which would be possible
without breaking the abstraction by adding a method to `IdSet`:
```java
default void intersect(int length, int[] values, int[] result) {
for (int i = 0; i < length; i++) {
results[i] = contains(values[i]) ? 1 : 0;
}
}
```
Which can be implemented efficiently when the set is backed by a
`RoaringBitmap`.
Longer term, representing `BOOLEAN` values as an `int[]` seems unfortunate -
`BOOLEAN` columns could have been represented as a `RoaringBitmap` on disk and
in memory in the query layer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]