Re: [PR] Add support for configuring aggregation function parameters in the star-tree index [pinot]

via GitHub Thu, 05 Sep 2024 22:21:15 -0700


yashmayya commented on code in PR #13835:
URL: https://github.com/apache/pinot/pull/13835#discussion_r1746502018



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/aggregator/DistinctCountCPCSketchValueAggregator.java:
##########
@@ -34,11 +35,11 @@ public class DistinctCountCPCSketchValueAggregator 
implements ValueAggregator<Ob
   private final int _lgK;
 
   public DistinctCountCPCSketchValueAggregator(List<ExpressionContext> 
arguments) {

Review Comment:
   Sketches with different `lgK` are mergeable - while the accuracy will 
differ, I was initially a bit hesitant to add this check because the 
`DistinctCountCPCSketchAggregationFunction` takes in the number of nominal 
entries which is `2 ^ lgK` whereas `DistinctCountCPCSketchValueAggregator` 
takes in `lgK` directly. So I was wondering if it'll be confusing for users to 
determine which queries with `DistinctCountCPCSketchAggregationFunction` can 
use the star-tree index.
   
   However, thinking about it some more, it doesn't really make sense to allow 
configuring `lgK` on a star-tree index for 
`DistinctCountCPCSketchAggregationFunction` but not add the corresponding query 
matching logic. I've made this change and I'll plan to add this caveat to the 
documentation.



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunction.java:
##########
@@ -132,6 +133,32 @@ default FinalResult mergeFinalResult(FinalResult 
finalResult1, FinalResult final
     throw new UnsupportedOperationException("Cannot merge final results for 
function: " + getType());
   }
 
+  /**
+   * Returns whether a star-tree index with the specified properties can be 
used for this aggregation function.
+   */
+  default boolean canUseStarTree(AggregationFunctionColumnPair 
functionColumnPair,

Review Comment:
   That's true, my original intention was to push more of that matching logic 
into `AggregationFunction` itself, but I think that might require more 
unrelated changes. I can remove this for now.



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/PercentileTDigestAggregationFunction.java:
##########
@@ -241,6 +243,24 @@ public Double extractFinalResult(TDigest 
intermediateResult) {
     return intermediateResult.quantile(_percentile / 100.0);
   }
 
+  @Override
+  public boolean canUseStarTree(AggregationFunctionColumnPair 
functionColumnPair,
+      Map<String, Object> functionParameters) {
+    if (!super.canUseStarTree(functionColumnPair, functionParameters)) {
+      return false;
+    }
+
+    // Check if compression factor matches
+    if 
(functionParameters.containsKey(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY))
 {

Review Comment:
   They are mergeable but the percentile accuracy will differ from what the 
user requested at query time, so I thought it'd be safer to only use the 
star-tree index when the compression factor value matches.



##########
pinot-core/src/main/java/org/apache/pinot/core/startree/StarTreeUtils.java:
##########
@@ -363,8 +372,14 @@ public static BaseProjectOperator<?> 
createStarTreeBasedProjectOperator(IndexSeg
     ExpressionContext[] groupByExpressions =
         queryContext.getGroupByExpressions() != null ? 
queryContext.getGroupByExpressions()
             .toArray(new ExpressionContext[0]) : null;
+
+    List<Pair<AggregationFunction, AggregationFunctionColumnPair>> 
aggregations =

Review Comment:
   Hm, the aggregation function length should be in the order of 10s at most so 
I thought it wouldn't really make a difference right? I'm fine either way 
though since the loop version is also pretty minimal, so I've made the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for configuring aggregation function parameters in the star-tree index [pinot]

Reply via email to