richardstartin commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r826732654
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DictionaryBasedAggregationOperator.java
##########
@@ -76,18 +80,52 @@ protected IntermediateResultsBlock getNextBlock() {
int dictionarySize = dictionary.length();
switch (aggregationFunction.getType()) {
case MIN:
+ case MINMV:
aggregationResults.add(toDouble(dictionary.getMinVal()));
break;
case MAX:
+ case MAXMV:
aggregationResults.add(toDouble(dictionary.getMaxVal()));
break;
case MINMAXRANGE:
+ case MINMAXRANGEMV:
aggregationResults.add(
new MinMaxRangePair(toDouble(dictionary.getMinVal()),
toDouble(dictionary.getMaxVal())));
break;
case DISTINCTCOUNT:
+ case DISTINCTCOUNTMV:
aggregationResults.add(getDistinctValueSet(dictionary));
break;
+ case DISTINCTCOUNTHLL:
+ case DISTINCTCOUNTHLLMV:
+ case DISTINCTCOUNTRAWHLL:
+ case DISTINCTCOUNTRAWHLLMV: {
+ HyperLogLog hll;
+ if (dictionary.getValueType() == FieldSpec.DataType.BYTES) {
+ // Treat BYTES value as serialized HyperLogLog
+ try {
+ hll =
ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(0));
+ for (int dictId = 1; dictId < dictionarySize; dictId++) {
+
hll.addAll(ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(dictId)));
+ }
+ } catch (Exception e) {
+ throw new RuntimeException("Caught exception while merging
HyperLogLogs", e);
+ }
+ } else {
+ int log2m;
+ if (aggregationFunction instanceof
DistinctCountHLLAggregationFunction) {
+ log2m = ((DistinctCountHLLAggregationFunction)
aggregationFunction).getLog2m();
+ } else {
+ log2m = ((DistinctCountRawHLLAggregationFunction)
aggregationFunction).getLog2m();
+ }
+ hll = new HyperLogLog(log2m);
+ for (int dictId = 0; dictId < dictionarySize; dictId++) {
+ hll.offer(dictionary.get(dictId));
+ }
+ }
+ aggregationResults.add(hll);
+ break;
+ }
Review comment:
By virtue of it taking 25 lines, I think we can say this logic is
nontrivial. Embedding this logic ere makes it harder for the reader to quickly
see which keys are handled by this switch statement (needs to search through
this block of code for the `break`). Since which keys are actually handled in
this switch statement is informative to the reader, we should factor out
nontrivial blocks into well named methods to separate contexts. I suggest doing
the same with the `DISTINCTCOUNTSMARTHLL` below, so this switch statement just
switches on the function type, creates the function at a high level, and adds
it to the list. Curious readers can drill in to the more complex construction
routines if they ever need to.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]