yashmayya commented on code in PR #13835:
URL: https://github.com/apache/pinot/pull/13835#discussion_r1746502018
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/aggregator/DistinctCountCPCSketchValueAggregator.java:
##########
@@ -34,11 +35,11 @@ public class DistinctCountCPCSketchValueAggregator
implements ValueAggregator<Ob
private final int _lgK;
public DistinctCountCPCSketchValueAggregator(List<ExpressionContext>
arguments) {
Review Comment:
Sketches with different `lgK` are mergeable - while the accuracy will
differ, I was initially a bit hesitant to add this check because the
`DistinctCountCPCSketchAggregationFunction` takes in the number of nominal
entries which is `2 ^ lgK` whereas `DistinctCountCPCSketchValueAggregator`
takes in `lgK` directly. So I was wondering if it'll be confusing for users to
determine which queries with `DistinctCountCPCSketchAggregationFunction` can
use the star-tree index.
However, thinking about it some more, it doesn't really make sense to allow
configuring `lgK` on a star-tree index for
`DistinctCountCPCSketchAggregationFunction` but not add the corresponding query
matching logic. I've made this change and I'll plan to add this caveat to the
documentation.
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunction.java:
##########
@@ -132,6 +133,32 @@ default FinalResult mergeFinalResult(FinalResult
finalResult1, FinalResult final
throw new UnsupportedOperationException("Cannot merge final results for
function: " + getType());
}
+ /**
+ * Returns whether a star-tree index with the specified properties can be
used for this aggregation function.
+ */
+ default boolean canUseStarTree(AggregationFunctionColumnPair
functionColumnPair,
Review Comment:
That's true, my original intention was to push more of that matching logic
into `AggregationFunction` itself, but I think that might require more
unrelated changes. I can remove this for now.
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/PercentileTDigestAggregationFunction.java:
##########
@@ -241,6 +243,24 @@ public Double extractFinalResult(TDigest
intermediateResult) {
return intermediateResult.quantile(_percentile / 100.0);
}
+ @Override
+ public boolean canUseStarTree(AggregationFunctionColumnPair
functionColumnPair,
+ Map<String, Object> functionParameters) {
+ if (!super.canUseStarTree(functionColumnPair, functionParameters)) {
+ return false;
+ }
+
+ // Check if compression factor matches
+ if
(functionParameters.containsKey(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY))
{
Review Comment:
They are mergeable but the percentile accuracy will differ from what the
user requested at query time, so I thought it'd be safer to only use the
star-tree index when the compression factor value matches.
##########
pinot-core/src/main/java/org/apache/pinot/core/startree/StarTreeUtils.java:
##########
@@ -363,8 +372,14 @@ public static BaseProjectOperator<?>
createStarTreeBasedProjectOperator(IndexSeg
ExpressionContext[] groupByExpressions =
queryContext.getGroupByExpressions() != null ?
queryContext.getGroupByExpressions()
.toArray(new ExpressionContext[0]) : null;
+
+ List<Pair<AggregationFunction, AggregationFunctionColumnPair>>
aggregations =
Review Comment:
Hm, the aggregation function length should be in the order of 10s at most so
I thought it wouldn't really make a difference right? I'm fine either way
though since the loop version is also pretty minimal, so I've made the change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]