gianm opened a new pull request, #13247:
URL: https://github.com/apache/druid/pull/13247
These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic to
convert the sketch to a number of some sort. This creates an inconsistency: the
functions sometimes return sketches, and sometimes return numbers, depending on
where they lie in the native query plan.
This patch changes these SQL aggregators to _never_ finalize, by using the
"shouldFinalize" feature of the native aggregators. It already existed for
theta sketches. This patch adds the feature for hll and quantiles sketches.
As to impact, Druid finalizes aggregators in two cases:
- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
post-aggregator (not any other kind of post-aggregator).
With this patch, the functions will no longer be finalized in these cases.
The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function that
makes sense and that would be planned into an expression.
So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the patch
adds a "sqlFinalizeOuterSketches" query context parameter that restores the old
behavior.
Other changes:
1) Move various argument-checking logic from runtime to planning time in
DoublesSketchListArgBaseOperatorConversion, by adding an
OperandTypeChecker.
2) Add various JsonIgnores to the sketches to simplify their JSON
representations.
3) Allow chaining of ExpressionPostAggregators and other PostAggregators
in the SQL layer.
4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
now that expressions can operate on complex inputs.
5) Adjust return type to thetaSketch (instead of OTHER) in
ThetaSketchSetBaseOperatorConversion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]