Jackie-Jiang commented on code in PR #18170:
URL: https://github.com/apache/pinot/pull/18170#discussion_r3068908436


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/converter/stats/MutableColumnStatistics.java:
##########
@@ -100,14 +122,24 @@ private void collectElementLengthIfNeeded() {
       _maxElementLength = length;
     } else {
       // If the stored type is not fixed width, iterate over the dictionary to 
find the min/max element length
+      // TODO: Collect these stats within Dictionary to avoid an extra scan
       _minElementLength = Integer.MAX_VALUE;
       _maxElementLength = 0;
+      boolean isAscii = valueType == DataType.STRING;
       int length = _dictionary.length();
       for (int i = 0; i < length; i++) {
-        int elementLength = _dictionary.getValueSize(i);
-        _minElementLength = Math.min(_minElementLength, elementLength);
-        _maxElementLength = Math.max(_maxElementLength, elementLength);
+        if (isAscii) {
+          byte[] bytes = _dictionary.getBytesValue(i);
+          _minElementLength = Math.min(_minElementLength, bytes.length);
+          _maxElementLength = Math.max(_maxElementLength, bytes.length);
+          isAscii = Utf8Utils.isAscii(bytes);
+        } else {
+          int elementLength = _dictionary.getValueSize(i);
+          _minElementLength = Math.min(_minElementLength, elementLength);
+          _maxElementLength = Math.max(_maxElementLength, elementLength);
+        }
       }
+      _isAscii = isAscii;

Review Comment:
   I don't follow. Why is it only reflecting the last entry?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to