davecromberge commented on code in PR #12042:
URL: https://github.com/apache/pinot/pull/12042#discussion_r1408425307
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java:
##########
@@ -102,9 +111,22 @@ public
DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
Preconditions.checkArgument(paramsExpression.getType() ==
ExpressionContext.Type.LITERAL,
"Second argument of DISTINCT_COUNT_THETA_SKETCH aggregation function
must be literal (parameters)");
Parameters parameters = new
Parameters(paramsExpression.getLiteral().getStringValue());
+ // Allows the user to trade-off memory usage for merge CPU; higher
values use more memory
+ _accumulatorThreshold = parameters.getAccumulatorThreshold();
+ // Ordering controls whether intermediate compact sketches are ordered
in set operations
+ _intermediateOrdering = parameters.getIntermediateOrdering();
+ // Nominal entries controls sketch accuracy and size
int nominalEntries = parameters.getNominalEntries();
_updateSketchBuilder.setNominalEntries(nominalEntries);
_setOperationBuilder.setNominalEntries(nominalEntries);
+ // Sampling probability sets the initial value of Theta, defaults to 1.0
+ float p = parameters.getSamplingProbability();
+ _setOperationBuilder.setP(p);
+ _updateSketchBuilder.setP(p);
+ // Resize factor controls the size multiple that affects how fast the
internal cache grows
+ ResizeFactor rf = parameters.getResizeFactor();
Review Comment:
The resize factor trades off memory consumption for performance. Sometimes,
the internal hash table of the sketch reaches capacity and must be resized
which is an expensive operation. All existing keys need to rehashed.
Using a different resize factor can control how large the initial hash table
is on construction, thereby trading off the cost of a resize operation with
potential over-allocation of memory on heap.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]