Re: [PR] Upgrade t-digest from 3.2 to 3.3 with error rate fix [pinot]

via GitHub Mon, 06 Apr 2026 12:47:40 -0700


Copilot commented on code in PR #18103:
URL: https://github.com/apache/pinot/pull/18103#discussion_r3041351608



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/StarTreeBuilderUtils.java:
##########
@@ -395,6 +396,12 @@ public static List<ExpressionContext> 
expressionContextFromFunctionParameters(
           expressionContexts.add(ExpressionContext.forLiteral(
               Literal.intValue(Integer.parseInt(String.valueOf(
                   
functionParameters.get(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY))))));
+        } else {
+          // Always inject the default compression so it is stored in 
star-tree metadata.
+          // This allows canUseStarTree() to distinguish old segments (no 
metadata, built with compression=100)
+          // from new segments (metadata present, built with the current 
default).

Review Comment:
   The comment here says injecting the default compression will be “stored in 
star-tree metadata” and used by canUseStarTree() to distinguish old vs new 
segments. However, this method only returns ExpressionContext arguments for 
ValueAggregatorFactory; star-tree metadata persistence is based on 
AggregationSpec.getFunctionParameters() (see StarTreeV2Metadata.writeMetadata). 
If AggregationSpec.functionParameters is empty (e.g., aggregationConfigs 
without functionParameters), metadata will still omit compressionFactor and 
canUseStarTree() will treat it as default=100 while the builder will create 
digests at DEFAULT_TDIGEST_COMPRESSION=500. Consider fixing this by ensuring 
AggregationSpec.functionParameters always includes compressionFactor for 
PERCENTILETDIGEST/PERCENTILERAWTDIGEST when absent, and adjust/remove this 
comment accordingly.
   ```suggestion
             // Persist the default compression in functionParameters so 
downstream star-tree metadata writers
             // record the same compression used by the builder when the config 
omits it.
             
functionParameters.put(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
                 PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION);
   ```



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/aggregator/PercentileTDigestValueAggregatorTest.java:
##########
@@ -0,0 +1,285 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.aggregator;
+
+import com.tdunning.math.stats.TDigest;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Random;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertTrue;
+
+
+/**
+ * Tests that the {@link PercentileTDigestValueAggregator} produces accurate 
quantile estimates
+ * under production-like conditions using the default compression (100).
+ *
+ * <p>These tests simulate single-path production behavior — where raw values 
are ingested into
+ * a TDigest, serialized/deserialized through segment and star-tree building, 
and then queried.

Review Comment:
   The class/Javadoc and nearby comments say the tests run with the default 
compression of 100, but 
PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION was changed to 
500. This makes the test documentation misleading (and the stated expected 
error characteristics may not apply). Update the comments to reflect the actual 
default compression being exercised (or explicitly construct the aggregator 
with compression=100 if that’s what this test intends to validate).



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/StarTreeV2BuilderConfig.java:
##########
@@ -323,4 +326,22 @@ public String toString() {
         .append("skipStarNodeCreation", _skipStarNodeCreationForDimensions)
         .append("aggregationSpecs", 
_aggregationSpecs).append("maxLeafRecords", _maxLeafRecords).toString();
   }
+
+  /**
+   * Returns the default {@link AggregationSpec} for the given function type. 
For PERCENTILETDIGEST and
+   * PERCENTILERAWTDIGEST, the default compression factor is explicitly 
included in the function parameters so that
+   * it is persisted in star-tree metadata. This allows {@code 
canUseStarTree()} to distinguish old segments (built
+   * with the previous default of 100, which have no compression metadata) 
from new segments.
+   */
+  private static AggregationSpec 
getDefaultAggregationSpec(AggregationFunctionType functionType) {
+    switch (functionType) {
+      case PERCENTILETDIGEST:
+      case PERCENTILERAWTDIGEST:
+        return new AggregationSpec(null, null, null, null, null,
+            Map.of(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
+                PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION));
+      default:
+        return AggregationSpec.DEFAULT;
+    }

Review Comment:
   By persisting 
compressionFactor=PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION 
(500) into star-tree metadata, 
PercentileTDigestAggregationFunction.canUseStarTree() will reject using the 
star-tree for queries that omit the compression argument (query-time default is 
still 100). This means percentiletdigest(col, p) queries will stop using the 
star-tree after segments are rebuilt unless users update queries to specify 
compression=500 (or canUseStarTree is relaxed to allow higher-compression 
star-trees). If this behavior change is intended, it would be good to make it 
explicit in code/comments and ensure there’s a migration path for existing 
star-tree users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Upgrade t-digest from 3.2 to 3.3 with error rate fix [pinot]

Reply via email to