davecromberge commented on code in PR #12042:
URL: https://github.com/apache/pinot/pull/12042#discussion_r1408423857
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java:
##########
@@ -102,9 +111,22 @@ public
DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
Preconditions.checkArgument(paramsExpression.getType() ==
ExpressionContext.Type.LITERAL,
"Second argument of DISTINCT_COUNT_THETA_SKETCH aggregation function
must be literal (parameters)");
Parameters parameters = new
Parameters(paramsExpression.getLiteral().getStringValue());
+ // Allows the user to trade-off memory usage for merge CPU; higher
values use more memory
+ _accumulatorThreshold = parameters.getAccumulatorThreshold();
+ // Ordering controls whether intermediate compact sketches are ordered
in set operations
+ _intermediateOrdering = parameters.getIntermediateOrdering();
+ // Nominal entries controls sketch accuracy and size
int nominalEntries = parameters.getNominalEntries();
_updateSketchBuilder.setNominalEntries(nominalEntries);
_setOperationBuilder.setNominalEntries(nominalEntries);
+ // Sampling probability sets the initial value of Theta, defaults to 1.0
Review Comment:
This sampling probability is useful to end users who wish to build sketches
on the fly from raw values or from existing sketches. These users might wish
to trim down the size of degenerate sketches (sketches that are below
capacity). There are certain use cases (such as ordering by top ranked items)
where the user might not care if the tail of the list is negatively impacted in
terms of error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]