davecromberge commented on code in PR #12042:
URL: https://github.com/apache/pinot/pull/12042#discussion_r1408425307


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java:
##########
@@ -102,9 +111,22 @@ public 
DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
       Preconditions.checkArgument(paramsExpression.getType() == 
ExpressionContext.Type.LITERAL,
           "Second argument of DISTINCT_COUNT_THETA_SKETCH aggregation function 
must be literal (parameters)");
       Parameters parameters = new 
Parameters(paramsExpression.getLiteral().getStringValue());
+      // Allows the user to trade-off memory usage for merge CPU; higher 
values use more memory
+      _accumulatorThreshold = parameters.getAccumulatorThreshold();
+      // Ordering controls whether intermediate compact sketches are ordered 
in set operations
+      _intermediateOrdering = parameters.getIntermediateOrdering();
+      // Nominal entries controls sketch accuracy and size
       int nominalEntries = parameters.getNominalEntries();
       _updateSketchBuilder.setNominalEntries(nominalEntries);
       _setOperationBuilder.setNominalEntries(nominalEntries);
+      // Sampling probability sets the initial value of Theta, defaults to 1.0
+      float p = parameters.getSamplingProbability();
+      _setOperationBuilder.setP(p);
+      _updateSketchBuilder.setP(p);
+      // Resize factor controls the size multiple that affects how fast the 
internal cache grows
+      ResizeFactor rf = parameters.getResizeFactor();

Review Comment:
   The resize factor trades off memory consumption for performance.  Sometimes, 
the internal hash table of the sketch reaches capacity and must be resized 
which is an expensive operation.  All existing keys need to rehashed.  
   Using a different resize factor can control how large the initial hash table 
is on construction, thereby trading off the cost of a resize operation with 
potential over-allocation of memory on heap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to