tarun11Mavani commented on code in PR #18760:
URL: https://github.com/apache/pinot/pull/18760#discussion_r3441502114
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/funnel/BitmapResultExtractionStrategy.java:
##########
@@ -42,14 +49,32 @@ public List<RoaringBitmap>
extractIntermediateResult(DictIdsWrapper dictIdsWrapp
}
return result;
}
- Dictionary dictionary = dictIdsWrapper._dictionary;
List<RoaringBitmap> result = new ArrayList<>(_numSteps);
- for (RoaringBitmap dictIdBitmap : dictIdsWrapper._stepsBitmaps) {
- result.add(convertToValueBitmap(dictionary, dictIdBitmap));
+ if (dictIdsWrapper.isMultiKey()) {
+ for (RoaringBitmap compositeIdBitmap : dictIdsWrapper._stepsBitmaps) {
+ result.add(convertCompositeToValueBitmap(dictIdsWrapper,
compositeIdBitmap));
+ }
+ } else {
+ Dictionary dictionary = dictIdsWrapper._dictionaries[0];
+ for (RoaringBitmap dictIdBitmap : dictIdsWrapper._stepsBitmaps) {
+ result.add(convertToValueBitmap(dictionary, dictIdBitmap));
+ }
}
return result;
}
+ private RoaringBitmap convertCompositeToValueBitmap(DictIdsWrapper wrapper,
RoaringBitmap compositeIdBitmap) {
+ RoaringBitmap valueBitmap = new RoaringBitmap();
+ PeekableIntIterator iterator = compositeIdBitmap.getIntIterator();
+ int numKeys = wrapper._dictionaries.length;
+ int[] dictIds = new int[numKeys];
+ while (iterator.hasNext()) {
+ wrapper.reverseCompositeId(iterator.next(), dictIds);
+ valueBitmap.add(DictIdsWrapper.toCompositeString(wrapper._dictionaries,
dictIds).hashCode());
Review Comment:
This is actually an existing limitation of the bitmap strategy, not
something new from multi-key. The single-key path in `convertToValueBitmap`
already uses `.hashCode()` for LONG, FLOAT, DOUBLE, and STRING types — only INT
gets exact values stored directly. The multi-key path is consistent:
`toCompositeString` itself is collision-free (length-prefix encoding is
injective), but the `.hashCode()` mapping to 32-bit int has the same collision
properties as single-key STRING at line 109.
I've updated the method Javadoc on `convertCompositeToValueBitmap` to call
this out more explicitly, linking it to the existing single-key non-INT
approximation.
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/funnel/ThetaSketchAggregationStrategy.java:
##########
@@ -66,8 +75,14 @@ void add(Dictionary dictionary, UpdateSketch[]
stepsSketches, int step, int corr
sketch.update(dictionary.getStringValue(correlationId));
break;
default:
- throw new IllegalStateException("Illegal CORRELATED_BY column data
type for FUNNEL_COUNT aggregation function: "
- + dictionary.getValueType());
+ throw new IllegalStateException(
+ "Illegal CORRELATED_BY column data type for FUNNEL_COUNT
aggregation function: "
+ + dictionary.getValueType());
}
}
+
+ @Override
+ void addMultiKey(UpdateSketch[] stepsSketches, int step, Dictionary[]
dictionaries, int[] correlationDictIds) {
+ stepsSketches[step].update(DictIdsWrapper.toCompositeString(dictionaries,
correlationDictIds));
Review Comment:
Fair point — `toCompositeString` does allocate a new `StringBuilder` +
`String` per row. A couple of things to note though:
- Theta sketch's `update()` only accepts primitives (`int`, `long`,
`double`) or `String`/`byte[]`. Since a multi-key tuple has no single primitive
representation, some form of serialization is unavoidable here.
- The single-key STRING path (line 75) already allocates a string per row
via `dictionary.getStringValue()`, so the cost pattern is structurally similar
— just slightly more overhead from the length-prefix encoding.
- JMH baseline for multi-key theta_sketch is 287 ops/s, which we can measure
future optimizations against.
One option would be switching to `update(byte[])` with a reusable
`ByteBuffer` to avoid the String intermediate, but wanted to keep it simple for
the initial implementation. Do you have other optimization ideas in mind?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]