[GitHub] [druid] gianm commented on a diff in pull request #12855: add missing information to topn and groupby cache keys

GitBox Thu, 04 Aug 2022 23:04:10 -0700


gianm commented on code in PR #12855:
URL: https://github.com/apache/druid/pull/12855#discussion_r937259000



##########
processing/src/main/java/org/apache/druid/query/topn/NumericTopNMetricSpec.java:
##########
@@ -135,12 +133,7 @@ public TopNResultBuilder getResultBuilder(
   @Override
   public byte[] getCacheKey()
   {
-    byte[] metricBytes = StringUtils.toUtf8(metric);
-
-    return ByteBuffer.allocate(1 + metricBytes.length)
-                     .put(CACHE_TYPE_ID)
-                     .put(metricBytes)
-                     .array();
+    return new byte[] {CACHE_TYPE_ID};

Review Comment:
   Isn't the name of the column important?



##########
processing/src/main/java/org/apache/druid/query/topn/TopNQueryQueryToolChest.java:
##########
@@ -292,19 +292,13 @@ public byte[] computeCacheKey(TopNQuery query)
       {
         final CacheKeyBuilder builder = new CacheKeyBuilder(TOPN_QUERY)
             .appendCacheable(query.getDimensionSpec())
-            .appendCacheable(query.getTopNMetricSpec())
             .appendInt(query.getThreshold())
             .appendCacheable(query.getGranularity())
             .appendCacheable(query.getDimensionsFilter())
             .appendCacheables(query.getAggregatorSpecs())
-            .appendCacheable(query.getVirtualColumns());
-
-        final List<PostAggregator> postAggregators = 
prunePostAggregators(query);
-        if (!postAggregators.isEmpty()) {
-          // Append post aggregators only when they are used as sort keys.
-          // Note that appending an empty list produces a different cache key 
from not appending it.
-          builder.appendCacheablesIgnoringOrder(postAggregators);
-        }
+            .appendCacheable(query.getVirtualColumns())

Review Comment:
   Similar issue to the one with groupBy (postaggs refer to 
dim/agg/other-postagg by name).



##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -556,9 +564,37 @@ public byte[] computeResultLevelCacheKey(GroupByQuery 
query)
             builder.appendStrings(subTotalSpec);
           }
         }
+        if (query.getLimitSpec() instanceof DefaultLimitSpec) {

Review Comment:
   This is dealing with the orderbys, but I think there's still an issue with 
postaggregators in groupBy result-level cache keys: they refer to aggregators 
and dimensions (and other post-aggregators) by name, but the names aren't 
stored in the cache key.
   
   I think we can deal with that by storing `List<String>`s corresponding to 
the names of all dims, aggs, and postaggs. That's equivalent to adding the name 
to each of those items' cache keys.



##########
processing/src/main/java/org/apache/druid/query/topn/TopNQueryQueryToolChest.java:
##########
@@ -417,6 +412,28 @@ public Result<TopNResultValue> apply(Object input)
           }
         };
       }
+
+      private void appendOrderingCachable(TopNQuery query, CacheKeyBuilder 
builder)

Review Comment:
   appendOrderingCacheable (spelling)



##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -556,9 +564,37 @@ public byte[] computeResultLevelCacheKey(GroupByQuery 
query)
             builder.appendStrings(subTotalSpec);
           }
         }
+        if (query.getLimitSpec() instanceof DefaultLimitSpec) {
+          DefaultLimitSpec limitSpec = (DefaultLimitSpec) query.getLimitSpec();
+          for (OrderByColumnSpec column : limitSpec.getColumns()) {
+            appendOrderingColumnCacheable(query, builder, 
column.getDimension());
+          }
+        }
         return builder.build();
       }
 
+      private void appendOrderingColumnCacheable(GroupByQuery query, 
CacheKeyBuilder builder, String sortColumn)
+      {
+        for (DimensionSpec dim : query.getDimensions()) {
+          if (sortColumn.equals(dim.getOutputName())) {
+            builder.appendCacheable(dim);

Review Comment:
   I think the cache IDs for all of these things (dim spec, aggregator factory, 
post aggregator) can potentially collide across types (like, a dim spec can 
have the same ID as an aggregator factory). That creates the potential for 
collisions. So we should have another byte that says what the following thing 
is.



##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -533,6 +535,12 @@ public byte[] computeCacheKey(GroupByQuery query)
             .appendCacheable(query.getVirtualColumns());
         if (query.isApplyLimitPushDown()) {
           builder.appendCacheable(query.getLimitSpec());
+          if (query.getLimitSpec() instanceof DefaultLimitSpec) {
+            DefaultLimitSpec limitSpec = (DefaultLimitSpec) 
query.getLimitSpec();
+            for (OrderByColumnSpec column : limitSpec.getColumns()) {
+              appendOrderingColumnCacheable(query, builder, 
column.getDimension());

Review Comment:
   `appendOrderingColumnCacheable` is always called with the same pattern: 
perhaps it should do the limit-spec stuff inside itself? i.e. it accepts just 
`query` and `builder`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on a diff in pull request #12855: add missing information to topn and groupby cache keys

Reply via email to