[GitHub] [pinot] richardstartin commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

GitBox Tue, 15 Mar 2022 02:07:36 -0700


richardstartin commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r826732654




##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DictionaryBasedAggregationOperator.java
##########
@@ -76,18 +80,52 @@ protected IntermediateResultsBlock getNextBlock() {
       int dictionarySize = dictionary.length();
       switch (aggregationFunction.getType()) {
         case MIN:
+        case MINMV:
           aggregationResults.add(toDouble(dictionary.getMinVal()));
           break;
         case MAX:
+        case MAXMV:
           aggregationResults.add(toDouble(dictionary.getMaxVal()));
           break;
         case MINMAXRANGE:
+        case MINMAXRANGEMV:
           aggregationResults.add(
               new MinMaxRangePair(toDouble(dictionary.getMinVal()), 
toDouble(dictionary.getMaxVal())));
           break;
         case DISTINCTCOUNT:
+        case DISTINCTCOUNTMV:
           aggregationResults.add(getDistinctValueSet(dictionary));
           break;
+        case DISTINCTCOUNTHLL:
+        case DISTINCTCOUNTHLLMV:
+        case DISTINCTCOUNTRAWHLL:
+        case DISTINCTCOUNTRAWHLLMV: {
+          HyperLogLog hll;
+          if (dictionary.getValueType() == FieldSpec.DataType.BYTES) {
+            // Treat BYTES value as serialized HyperLogLog
+            try {
+              hll = 
ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(0));
+              for (int dictId = 1; dictId < dictionarySize; dictId++) {
+                
hll.addAll(ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(dictId)));
+              }
+            } catch (Exception e) {
+              throw new RuntimeException("Caught exception while merging 
HyperLogLogs", e);
+            }
+          } else {
+            int log2m;
+            if (aggregationFunction instanceof 
DistinctCountHLLAggregationFunction) {
+              log2m = ((DistinctCountHLLAggregationFunction) 
aggregationFunction).getLog2m();
+            } else {
+              log2m = ((DistinctCountRawHLLAggregationFunction) 
aggregationFunction).getLog2m();
+            }
+            hll = new HyperLogLog(log2m);
+            for (int dictId = 0; dictId < dictionarySize; dictId++) {
+              hll.offer(dictionary.get(dictId));
+            }
+          }
+          aggregationResults.add(hll);
+          break;
+        }

Review comment:
       By virtue of it taking 25 lines, I think we can say this logic is 
nontrivial. Embedding this logic ere makes it harder for the reader to quickly 
see which keys are handled by this switch statement (needs to search through 
this block of code for the `break`). Since which keys are actually handled in 
this switch statement is informative to the reader, we should factor out 
nontrivial blocks into well named methods to separate contexts. I suggest doing 
the same with the `DISTINCTCOUNTSMARTHLL` below, so this switch statement just 
switches on the function type, creates the function at a high level, and adds 
it to the list. Curious readers can drill in to the more complex construction 
routines if they ever need to.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] richardstartin commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

Reply via email to