richardstartin opened a new issue #7980:
URL: https://github.com/apache/pinot/issues/7980


   `InIdSetTransfomFunction.transformToIntValuesSV` works by expanding a set 
into an inflated bitmap using 32 bits per docId by checking if each dictId is 
in the `IdSet` or not:
   
   ```java
         case INT:
           int[] intValues = 
_transformFunction.transformToIntValuesSV(projectionBlock);
           for (int i = 0; i < length; i++) {
             _results[i] = _idSet.contains(intValues[i]) ? 1 : 0;
           }
           break;
   ``` 
   
   For `INT` columns, the `IdSet` is backed by a `RoaringBitmap` and `contains` 
is a log(n) operation. This translates to a lot of time spent locating  
`Container`s in large `IdSet`s:
   
   <img width="840" alt="Screenshot 2022-01-07 at 09 07 56" 
src="https://user-images.githubusercontent.com/16439049/148520146-fa404d9c-0090-489f-9f55-fb8dab15e115.png";>
   
   <img width="1539" alt="Screenshot 2022-01-07 at 09 07 37" 
src="https://user-images.githubusercontent.com/16439049/148520116-beceed90-eb9e-4ed3-b64d-c3c511886df7.png";>
   
   This could be made more efficient by constructing a `RoaringBitmap` from 
`intValues` and intersecting it with the `IdSet`, which would be possible 
without breaking the abstraction by adding a method to `IdSet`:
   
   ```java
   default void intersect(int length, int[] values, int[] result) {
           for (int i = 0; i < length; i++) {
             results[i] = contains(values[i]) ? 1 : 0;
           }   
   }
   ```
   
   Which can be implemented efficiently when the set is backed by a 
`RoaringBitmap`.
   
   Longer term, representing `BOOLEAN` values as an `int[]` seems unfortunate - 
`BOOLEAN` columns could have been represented as a `RoaringBitmap` on disk and 
in memory in the query layer.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to