andimiller opened a new pull request, #10427:
URL: https://github.com/apache/pinot/pull/10427
This adds support for `BYTES` columns containing Tuple Sketches with Integer
as the summary type.
The added classes currently support `Sum` as the semigroup, but are generic
so others can be added.
Feature breakdown:
1. Add transform functions that can be used to create Integer Tuple Sketches
during ingestion, eg. `toIntegerSumTupleSketch(colA, colbB, 16)`
2. Add Codecs that use the Datasketches serialization
3. Add aggregation functions:
* `DISTINCT_COUNT_TUPLE_SKETCH` will just get the estimate for the number
of unique keys, same as Theta or HLL
* `DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH` will merge the sketches
using `Sum` as the semigroup and return the raw sketch
* `SUM_VALUES_INTEGER_SUM_TUPLE_SKETCH` will merge the sketches using
`Sum` as the semigroup and estimate the sum of the value side
* `AVG_VALUES_INTEGER_SUM_TUPLE_SKETCH` will merge the sketches using
`Sum` as the semigroup and estimate the average of the value side
4. Add `ValueAggregator<_, _>`s for use in `StarTree` indexes for all 4
above aggregations
5. Add `ValueAggregator`s for use in rollups for all 4 above aggregations
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]