richardstartin commented on a change in pull request #7630:
URL: https://github.com/apache/pinot/pull/7630#discussion_r735725116
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
##########
@@ -326,6 +375,54 @@ public Long extractFinalResult(HyperLogLog
intermediateResult) {
return intermediateResult.cardinality();
}
+ /**
+ * Returns the dictionary id bitmap from the result holder or creates a new
one if it does not exist.
+ */
+ protected static RoaringBitmap getDictIdBitmap(AggregationResultHolder
aggregationResultHolder,
+ Dictionary dictionary) {
+ DistinctCountHLLAggregationFunction.DictIdsWrapper dictIdsWrapper =
aggregationResultHolder.getResult();
+ if (dictIdsWrapper == null) {
+ dictIdsWrapper = new
DistinctCountHLLAggregationFunction.DictIdsWrapper(dictionary);
+ aggregationResultHolder.setValue(dictIdsWrapper);
+ }
+ return dictIdsWrapper._dictIdBitmap;
+ }
+
+ /**
+ * Returns the dictionary id bitmap for the given group key or creates a new
one if it does not exist.
+ */
+ protected static RoaringBitmap getDictIdBitmap(GroupByResultHolder
groupByResultHolder, int groupKey,
+ Dictionary dictionary) {
+ DistinctCountHLLAggregationFunction.DictIdsWrapper dictIdsWrapper =
groupByResultHolder.getResult(groupKey);
+ if (dictIdsWrapper == null) {
+ dictIdsWrapper = new
DistinctCountHLLAggregationFunction.DictIdsWrapper(dictionary);
+ groupByResultHolder.setValueForKey(groupKey, dictIdsWrapper);
+ }
+ return dictIdsWrapper._dictIdBitmap;
+ }
+
+ /**
+ * Helper method to set dictionary id for the given group keys into the
result holder.
+ */
+ private static void setDictIdForGroupKeys(GroupByResultHolder
groupByResultHolder, int[] groupKeys,
+ Dictionary dictionary, int dictId) {
+ for (int groupKey : groupKeys) {
+ getDictIdBitmap(groupByResultHolder, groupKey, dictionary).add(dictId);
+ }
+ }
+
+ private HyperLogLog convertToHLL(DictIdsWrapper dictIdsWrapper) {
+ Dictionary dictionary = dictIdsWrapper._dictionary;
+ RoaringBitmap dictIdBitmap = dictIdsWrapper._dictIdBitmap;
+ int numValues = dictIdBitmap.getCardinality();
+ PeekableIntIterator iterator = dictIdBitmap.getIntIterator();
+ HyperLogLog hyperLogLog = new HyperLogLog(_log2m);
+ while (iterator.hasNext()) {
+ hyperLogLog.offer(iterator.next());
+ }
Review comment:
I don't think there are any articles written about it, but the general
problem with iterators is switching back and forth between two contexts, which
doesn't have to happen with inverted iteration. If you consider iterating over
a bitmap, you can see the the iterator has to save the state each time it yields
```java
@Override
public boolean hasNext() {
return x < bitmap.length;
}
@Override
public char next() {
char answer = (char) (x * 64 + numberOfTrailingZeros(w));
w &= (w - 1);
while (w == 0) {
++x;
if (x == bitmap.length) {
break;
}
w = bitmap[x];
}
return answer;
}
```
vs
```java
int high = msb << 16;
for (int x = 0; x < bitmap.length; ++x) {
long w = bitmap[x];
while (w != 0) {
ic.accept(((x << 6) + numberOfTrailingZeros(w)) | high);
w &= (w - 1);
}
}
```
Tiny differences like that just add up over thousands of invocations.
@lemire might have something to add here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]