Copilot commented on code in PR #18103:
URL: https://github.com/apache/pinot/pull/18103#discussion_r3041351608
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/StarTreeBuilderUtils.java:
##########
@@ -395,6 +396,12 @@ public static List<ExpressionContext>
expressionContextFromFunctionParameters(
expressionContexts.add(ExpressionContext.forLiteral(
Literal.intValue(Integer.parseInt(String.valueOf(
functionParameters.get(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY))))));
+ } else {
+ // Always inject the default compression so it is stored in
star-tree metadata.
+ // This allows canUseStarTree() to distinguish old segments (no
metadata, built with compression=100)
+ // from new segments (metadata present, built with the current
default).
Review Comment:
The comment here says injecting the default compression will be “stored in
star-tree metadata” and used by canUseStarTree() to distinguish old vs new
segments. However, this method only returns ExpressionContext arguments for
ValueAggregatorFactory; star-tree metadata persistence is based on
AggregationSpec.getFunctionParameters() (see StarTreeV2Metadata.writeMetadata).
If AggregationSpec.functionParameters is empty (e.g., aggregationConfigs
without functionParameters), metadata will still omit compressionFactor and
canUseStarTree() will treat it as default=100 while the builder will create
digests at DEFAULT_TDIGEST_COMPRESSION=500. Consider fixing this by ensuring
AggregationSpec.functionParameters always includes compressionFactor for
PERCENTILETDIGEST/PERCENTILERAWTDIGEST when absent, and adjust/remove this
comment accordingly.
```suggestion
// Persist the default compression in functionParameters so
downstream star-tree metadata writers
// record the same compression used by the builder when the config
omits it.
functionParameters.put(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION);
```
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/aggregator/PercentileTDigestValueAggregatorTest.java:
##########
@@ -0,0 +1,285 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.aggregator;
+
+import com.tdunning.math.stats.TDigest;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Random;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertTrue;
+
+
+/**
+ * Tests that the {@link PercentileTDigestValueAggregator} produces accurate
quantile estimates
+ * under production-like conditions using the default compression (100).
+ *
+ * <p>These tests simulate single-path production behavior — where raw values
are ingested into
+ * a TDigest, serialized/deserialized through segment and star-tree building,
and then queried.
Review Comment:
The class/Javadoc and nearby comments say the tests run with the default
compression of 100, but
PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION was changed to
500. This makes the test documentation misleading (and the stated expected
error characteristics may not apply). Update the comments to reflect the actual
default compression being exercised (or explicitly construct the aggregator
with compression=100 if that’s what this test intends to validate).
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/StarTreeV2BuilderConfig.java:
##########
@@ -323,4 +326,22 @@ public String toString() {
.append("skipStarNodeCreation", _skipStarNodeCreationForDimensions)
.append("aggregationSpecs",
_aggregationSpecs).append("maxLeafRecords", _maxLeafRecords).toString();
}
+
+ /**
+ * Returns the default {@link AggregationSpec} for the given function type.
For PERCENTILETDIGEST and
+ * PERCENTILERAWTDIGEST, the default compression factor is explicitly
included in the function parameters so that
+ * it is persisted in star-tree metadata. This allows {@code
canUseStarTree()} to distinguish old segments (built
+ * with the previous default of 100, which have no compression metadata)
from new segments.
+ */
+ private static AggregationSpec
getDefaultAggregationSpec(AggregationFunctionType functionType) {
+ switch (functionType) {
+ case PERCENTILETDIGEST:
+ case PERCENTILERAWTDIGEST:
+ return new AggregationSpec(null, null, null, null, null,
+ Map.of(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
+ PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION));
+ default:
+ return AggregationSpec.DEFAULT;
+ }
Review Comment:
By persisting
compressionFactor=PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION
(500) into star-tree metadata,
PercentileTDigestAggregationFunction.canUseStarTree() will reject using the
star-tree for queries that omit the compression argument (query-time default is
still 100). This means percentiletdigest(col, p) queries will stop using the
star-tree after segments are rebuilt unless users update queries to specify
compression=500 (or canUseStarTree is relaxed to allow higher-compression
star-trees). If this behavior change is intended, it would be good to make it
explicit in code/comments and ensure there’s a migration path for existing
star-tree users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]