gianm commented on code in PR #12855:
URL: https://github.com/apache/druid/pull/12855#discussion_r937259000
##########
processing/src/main/java/org/apache/druid/query/topn/NumericTopNMetricSpec.java:
##########
@@ -135,12 +133,7 @@ public TopNResultBuilder getResultBuilder(
@Override
public byte[] getCacheKey()
{
- byte[] metricBytes = StringUtils.toUtf8(metric);
-
- return ByteBuffer.allocate(1 + metricBytes.length)
- .put(CACHE_TYPE_ID)
- .put(metricBytes)
- .array();
+ return new byte[] {CACHE_TYPE_ID};
Review Comment:
Isn't the name of the column important?
##########
processing/src/main/java/org/apache/druid/query/topn/TopNQueryQueryToolChest.java:
##########
@@ -292,19 +292,13 @@ public byte[] computeCacheKey(TopNQuery query)
{
final CacheKeyBuilder builder = new CacheKeyBuilder(TOPN_QUERY)
.appendCacheable(query.getDimensionSpec())
- .appendCacheable(query.getTopNMetricSpec())
.appendInt(query.getThreshold())
.appendCacheable(query.getGranularity())
.appendCacheable(query.getDimensionsFilter())
.appendCacheables(query.getAggregatorSpecs())
- .appendCacheable(query.getVirtualColumns());
-
- final List<PostAggregator> postAggregators =
prunePostAggregators(query);
- if (!postAggregators.isEmpty()) {
- // Append post aggregators only when they are used as sort keys.
- // Note that appending an empty list produces a different cache key
from not appending it.
- builder.appendCacheablesIgnoringOrder(postAggregators);
- }
+ .appendCacheable(query.getVirtualColumns())
Review Comment:
Similar issue to the one with groupBy (postaggs refer to
dim/agg/other-postagg by name).
##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -556,9 +564,37 @@ public byte[] computeResultLevelCacheKey(GroupByQuery
query)
builder.appendStrings(subTotalSpec);
}
}
+ if (query.getLimitSpec() instanceof DefaultLimitSpec) {
Review Comment:
This is dealing with the orderbys, but I think there's still an issue with
postaggregators in groupBy result-level cache keys: they refer to aggregators
and dimensions (and other post-aggregators) by name, but the names aren't
stored in the cache key.
I think we can deal with that by storing `List<String>`s corresponding to
the names of all dims, aggs, and postaggs. That's equivalent to adding the name
to each of those items' cache keys.
##########
processing/src/main/java/org/apache/druid/query/topn/TopNQueryQueryToolChest.java:
##########
@@ -417,6 +412,28 @@ public Result<TopNResultValue> apply(Object input)
}
};
}
+
+ private void appendOrderingCachable(TopNQuery query, CacheKeyBuilder
builder)
Review Comment:
appendOrderingCacheable (spelling)
##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -556,9 +564,37 @@ public byte[] computeResultLevelCacheKey(GroupByQuery
query)
builder.appendStrings(subTotalSpec);
}
}
+ if (query.getLimitSpec() instanceof DefaultLimitSpec) {
+ DefaultLimitSpec limitSpec = (DefaultLimitSpec) query.getLimitSpec();
+ for (OrderByColumnSpec column : limitSpec.getColumns()) {
+ appendOrderingColumnCacheable(query, builder,
column.getDimension());
+ }
+ }
return builder.build();
}
+ private void appendOrderingColumnCacheable(GroupByQuery query,
CacheKeyBuilder builder, String sortColumn)
+ {
+ for (DimensionSpec dim : query.getDimensions()) {
+ if (sortColumn.equals(dim.getOutputName())) {
+ builder.appendCacheable(dim);
Review Comment:
I think the cache IDs for all of these things (dim spec, aggregator factory,
post aggregator) can potentially collide across types (like, a dim spec can
have the same ID as an aggregator factory). That creates the potential for
collisions. So we should have another byte that says what the following thing
is.
##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -533,6 +535,12 @@ public byte[] computeCacheKey(GroupByQuery query)
.appendCacheable(query.getVirtualColumns());
if (query.isApplyLimitPushDown()) {
builder.appendCacheable(query.getLimitSpec());
+ if (query.getLimitSpec() instanceof DefaultLimitSpec) {
+ DefaultLimitSpec limitSpec = (DefaultLimitSpec)
query.getLimitSpec();
+ for (OrderByColumnSpec column : limitSpec.getColumns()) {
+ appendOrderingColumnCacheable(query, builder,
column.getDimension());
Review Comment:
`appendOrderingColumnCacheable` is always called with the same pattern:
perhaps it should do the limit-spec stuff inside itself? i.e. it accepts just
`query` and `builder`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]