Re: [PR] [fix](TIMESTAMPTZ)Prevents timezone-unsafe reparsing using getStringValue() [doris]

via GitHub Thu, 11 Jun 2026 04:11:24 -0700


morrySnow commented on code in PR #64026:
URL: https://github.com/apache/doris/pull/64026#discussion_r3395364626



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/PartitionValue.java:
##########
@@ -17,42 +17,40 @@
 
 package org.apache.doris.analysis;
 
-import org.apache.doris.catalog.Type;
-import org.apache.doris.common.AnalysisException;
-
 import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
 
 public class PartitionValue {
     public static final PartitionValue MAX_VALUE = new PartitionValue();
 
-    private String value;
+    private LiteralExpr value;
+    private String stringValue;
     private boolean isNullPartition = false;
 
     private PartitionValue() {
-
+        this.value = MaxLiteral.MAX_VALUE;
+        this.stringValue = "MAXVALUE";
     }
 
-    public PartitionValue(String value) {
-        this.value = value;
+    public PartitionValue(LiteralExpr value) {
+        this(value, false, value.getStringValue());
     }
 
-    public PartitionValue(Long value) {
-        this.value = value.toString();
+    public PartitionValue(LiteralExpr value, boolean isNullPartition) {
+        this(value, isNullPartition, value == null ? null : 
value.getStringValue());
     }
 
-    public PartitionValue(String value, boolean isNullPartition) {
-        this.value = value;
+    public PartitionValue(LiteralExpr value, boolean isNullPartition, String 
stringValue) {
+        if (isNullPartition) {
+            Preconditions.checkArgument(value instanceof NullLiteral);
+        }
+        this.value = Preconditions.checkNotNull(value);
         this.isNullPartition = isNullPartition;
+        this.stringValue = stringValue != null ? stringValue : 
this.value.getStringValue();
     }
 
-    public LiteralExpr getValue(Type type) throws AnalysisException {
-        if (isNullPartition) {
-            return new NullLiteral();
-        }
-        if (isMax()) {
-            return LiteralExprUtils.createInfinity(type, true);
-        }
-        return LiteralExprUtils.createLiteral(value, type);
+    public LiteralExpr getValue() {
+        return value;
     }
 
     public boolean isMax() {

Review Comment:
   **Moderate: Including `stringValue` in `equals()` may break null partition 
value equality across code paths.**
   
   For null partition values, `stringValue` differs depending on the 
construction path:
   - Internal FE path (`PartitionInfo.toPartitionValue()`) → `stringValue` = 
`"NULL"` (from `NullLiteral.getStringValue()`)
   - External catalog path (`HiveExternalMetaCache`, `TablePartitionValues`) → 
`stringValue` = original partition string like `"__HIVE_DEFAULT_PARTITION__"`
   
   Two `PartitionValue`s representing the same logically-null partition from 
different sources will now be **inequal** due to this field, even though the 
inner `NullLiteral` values compare correctly (via `compareLiteral` returning 0 
for any `NullLiteral`).
   
   Consider whether `stringValue` is purely metadata for display/serialization 
— if so, it should be excluded from `equals()`/`hashCode()`. If intentional, a 
comment explaining the rationale would help.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/MultiPartitionDesc.java:
##########
@@ -381,11 +402,74 @@ private static DateTimeFormatter dateFormat(TimeUnit 
timeUnitType,
     }
 
     private DateTimeFormatter dateTypeFormat(String dateTimeStr) {
-        String s = DATE_FORMAT;
-        if (this.timeUnitType.equals(TimeUnit.HOUR) || dateTimeStr.length() == 
19) {
-            s = DATETIME_FORMAT;
+        if (isTimestampTzFormat(dateTimeStr)) {
+            return dateTimeFormatter(stripTimeZone(dateTimeStr));
+        }
+        if (this.timeUnitType.equals(TimeUnit.HOUR) || 
isDateTimeFormat(dateTimeStr)) {
+            return dateTimeFormatter(dateTimeStr);
+        }
+        return DateTimeFormatter.ofPattern(DATE_FORMAT);
+    }
+

Review Comment:
   **Note:** `hasTimeZoneSuffix` detects `+HH:MM` style offsets but won't match 
named timezones (`Asia/Shanghai`) or abbreviated zones (`UTC`). By the time 
values reach `MultiPartitionDesc`, they have already been canonicalized to 
UTC+offset form via `PartitionDefinition.strictTypedPartitionExpression()` → 
`TimestampTzLiteral.fromSessionTimeZone()`, so this is safe in practice. A 
brief comment noting this invariant would help future readers.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/LiteralExprUtils.java:
##########
@@ -85,6 +97,39 @@ public static LiteralExpr createLiteral(String value, Type 
type) throws Analysis
         return literalExpr;
     }
 
+    private static LiteralExpr createFloatingPointLiteral(String value, Type 
type) throws AnalysisException {
+        try {
+            return new FloatLiteral(Double.parseDouble(value), type);
+        } catch (NumberFormatException e) {
+            throw new AnalysisException("Invalid floating-point literal: " + 
value, e);
+        }
+    }
+
+    private static LiteralExpr createDecimalLiteral(String value, Type type) 
throws AnalysisException {
+        Preconditions.checkArgument(type instanceof ScalarType);
+        ScalarType scalarType = (ScalarType) type;
+        BigDecimal decimalValue;
+        try {
+            decimalValue = new BigDecimal(value);
+        } catch (NumberFormatException e) {
+            throw new AnalysisException("Invalid floating-point literal: " + 
value, e);
+        }
+        decimalValue = decimalValue.setScale(scalarType.getScalarScale(), 
RoundingMode.HALF_UP);

Review Comment:
   **Moderate: Silent rounding via `setScale(scalarType.getScalarScale(), 
RoundingMode.HALF_UP)`.** For partition boundary values, this silently rounds 
e.g. `10.005` → `10.01` for a `DECIMAL(10,2)` column. If a user specifies 
`VALUES LESS THAN ('10.005')` on a `DECIMAL(10,2)` column, the rounded boundary 
(`10.01`) might not be what they intended. Consider whether exceeding the 
column's scale should produce an `AnalysisException` instead.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/LiteralExprUtils.java:
##########
@@ -85,6 +97,39 @@ public static LiteralExpr createLiteral(String value, Type 
type) throws Analysis
         return literalExpr;
     }
 
+    private static LiteralExpr createFloatingPointLiteral(String value, Type 
type) throws AnalysisException {
+        try {
+            return new FloatLiteral(Double.parseDouble(value), type);
+        } catch (NumberFormatException e) {
+            throw new AnalysisException("Invalid floating-point literal: " + 
value, e);
+        }
+    }
+
+    private static LiteralExpr createDecimalLiteral(String value, Type type) 
throws AnalysisException {
+        Preconditions.checkArgument(type instanceof ScalarType);
+        ScalarType scalarType = (ScalarType) type;
+        BigDecimal decimalValue;
+        try {
+            decimalValue = new BigDecimal(value);
+        } catch (NumberFormatException e) {
+            throw new AnalysisException("Invalid floating-point literal: " + 
value, e);

Review Comment:
   **Minor: Misleading error message.** This says `"Invalid floating-point 
literal"` but the method is `createDecimalLiteral`. Consider changing to 
`"Invalid decimal literal"` for easier debugging.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/DateTimeLiteral.java:
##########
@@ -203,8 +202,13 @@ protected void roundMicroSecond(int scale) {
         this.microSecond = Math.round(this.microSecond / factor) * (int) 
factor;
 
         if (this.microSecond >= 1000000) {
-            LocalDateTime localDateTime = 
DateUtils.getTime(StandardDateFormat.DATE_TIME_FORMATTER_TO_MICRO_SECOND,
-                    getStringValue()).plusSeconds(1);

Review Comment:
   **Good fix:** `roundMicroSecond` now uses internal numeric fields (`year`, 
`month`, `day`…) directly instead of re-parsing via `getStringValue()`. The old 
code was timezone-unsafe because `getStringValue()` includes a timezone offset 
for TIMESTAMPTZ, which the `DATE_TIME_FORMATTER_TO_MICRO_SECOND` formatter 
couldn't parse. The new approach is timezone-agnostic and correctly handles the 
`'9999-12-31 23:59:59.999999+08:00'` boundary overflow.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/PartitionValue.java:
##########
@@ -63,7 +61,7 @@ public String getStringValue() {
         if (isMax()) {
             return "MAXVALUE";

Review Comment:
   Related to above: if `stringValue` is removed from `equals()`, it must also 
be removed from `hashCode()` to maintain the contract.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/MultiPartitionDesc.java:
##########
@@ -381,11 +402,74 @@ private static DateTimeFormatter dateFormat(TimeUnit 
timeUnitType,
     }
 
     private DateTimeFormatter dateTypeFormat(String dateTimeStr) {
-        String s = DATE_FORMAT;
-        if (this.timeUnitType.equals(TimeUnit.HOUR) || dateTimeStr.length() == 
19) {
-            s = DATETIME_FORMAT;
+        if (isTimestampTzFormat(dateTimeStr)) {
+            return dateTimeFormatter(stripTimeZone(dateTimeStr));
+        }
+        if (this.timeUnitType.equals(TimeUnit.HOUR) || 
isDateTimeFormat(dateTimeStr)) {
+            return dateTimeFormatter(dateTimeStr);
+        }
+        return DateTimeFormatter.ofPattern(DATE_FORMAT);
+    }
+
+    private String formatPartitionDateTime(LocalDateTime dateTime, String 
originalDateTimeStr) {
+        if (isTimestampTzFormat(originalDateTimeStr)) {
+            return dateTime.format(dateTypeFormat(originalDateTimeStr)) + 
timeZoneSuffix(originalDateTimeStr);
+        }
+        return dateTime.format(dateTypeFormat(originalDateTimeStr));
+    }
+
+    private static boolean isDateTimeFormat(String dateTimeStr) {
+        int length = dateTimeStr.length();
+        return length == 19 || (length >= 21 && length <= 26);
+    }
+
+    private static boolean isTimestampTzFormat(String dateTimeStr) {
+        int length = dateTimeStr.length();
+        return (length == 25 || (length >= 27 && length <= 32)) && 
hasTimeZoneSuffix(dateTimeStr);
+    }
+
+    private static boolean hasTimeZoneSuffix(String dateTimeStr) {
+        return dateTimeStr.length() >= 6
+                && (dateTimeStr.charAt(dateTimeStr.length() - 6) == '+'
+                        || dateTimeStr.charAt(dateTimeStr.length() - 6) == '-')
+                && dateTimeStr.charAt(dateTimeStr.length() - 3) == ':';
+    }
+
+    private static DateTimeFormatter timestampTzFormatter(String dateTimeStr) {
+        return 
DateTimeFormatter.ofPattern(dateTimePattern(stripTimeZone(dateTimeStr)) + 
"XXX");
+    }
+
+    private static DateTimeFormatter dateTimeFormatter(String dateTimeStr) {
+        return DateTimeFormatter.ofPattern(dateTimePattern(dateTimeStr));
+    }
+
+    private static String dateTimePattern(String dateTimeStr) {
+        switch (dateTimeStr.length()) {
+            case 19:
+                return DATETIME_FORMAT;

Review Comment:
   **Minor robustness note:** `stripTimeZone()` unconditionally removes the 
last 6 characters using `substring(0, length - 6)`. This is safe because 
callers guard with `hasTimeZoneSuffix()` / `isTimestampTzFormat()` first, but 
the coupling is fragile — if a future change calls `stripTimeZone()` on an 
unchecked string, it would silently corrupt the value. Consider adding a 
precondition check with 
`Preconditions.checkArgument(hasTimeZoneSuffix(dateTimeStr))` at the top of 
`stripTimeZone()`.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java:
##########
@@ -1744,6 +1754,13 @@ private static Optional<SchemaCacheValue> 
loadTableSchemaCacheValue(ExternalTabl
             Types.NestedField col = 
icebergTable.schema().findField(field.sourceId());
             for (Column c : schema) {
                 if (c.getName().equalsIgnoreCase(col.name())) {
+                    // For partition column, if it is string type, change it 
to varchar(65535)
+                    // to be same as doris managed table.
+                    // This is to avoid some unexpected behavior such as 
different partition pruning result
+                    // between doris managed table and external table.
+                    if (c.getType().getPrimitiveType() == 
PrimitiveType.STRING) {
+                        
c.setType(ScalarType.createVarcharType(ScalarType.MAX_VARCHAR_LENGTH));

Review Comment:
   **Note:** String partition columns are converted from `STRING` to 
`VARCHAR(65533)` to align with Doris managed table behavior. The value `65533` 
= `MAX_VARCHAR_LENGTH(65535) - 2`. Is the `-2` offset intentional (perhaps for 
encoding overhead)? A brief comment explaining the choice would help. Same 
normalization pattern in `PaimonExternalTable.java`.



##########
fe/fe-catalog/src/main/java/org/apache/doris/analysis/ArrayLiteral.java:
##########
@@ -52,6 +52,18 @@ public boolean isMinValue() {
 
     @Override
     public int compareLiteral(LiteralExpr expr) {
+        if (expr instanceof PlaceHolderExpr) {
+            return this.compareLiteral(((PlaceHolderExpr) expr).getLiteral());
+        }
+        if (expr instanceof NullLiteral) {
+            return 1;
+        }
+        if (expr == MaxLiteral.MAX_VALUE) {
+            return -1;
+        }
+        if (!(expr instanceof ArrayLiteral) || !type.equals(expr.type)) {
+            return -1;
+        }
         int size = Math.min(expr.getChildren().size(), this.children.size());
         for (int i = 0; i < size; i++) {

Review Comment:
   **Design question:** All `compareLiteral` overrides now return `-1` ("this < 
other") for incompatible/cross-type comparisons, replacing previous behavior 
that either threw exceptions or used string-based fallbacks. This is consistent 
across the codebase, but returning `-1` implies an ordering like `BoolLiteral < 
FloatLiteral`. Consider whether `-1` is the correct default for all cross-type 
pairs — the previous throwing behavior was more defensive and would catch bugs 
earlier. Returning `-1` may silently mask errors where incompatible types are 
inadvertently compared.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/PartitionDefinition.java:
##########
@@ -189,12 +200,73 @@ public void setPartitionTypes(List<DataType> 
partitionTypes) {
         this.partitionTypes = partitionTypes;
     }
 
+    protected void ensurePartitionTypesInitialized() {
+        if (partitionTypes == null) {
+            throw new AnalysisException("partitionTypes should be initialized 
before validating partition definition");
+        }
+    }
+
+    protected Expression typedPartitionExpression(Expression expression, int 
index) {
+        ensurePartitionTypesInitialized();
+        return strictTypedPartitionExpression(expression, 
partitionTypes.get(index));
+    }
+
+    static Expression strictTypedPartitionExpression(Expression expression, 
DataType targetType) {
+        if (expression instanceof MaxValue || expression instanceof 
MaxLiteral) {
+            return expression;
+        }
+        if (expression.isNullLiteral()) {
+            return expression.checkedCastTo(targetType);
+        }
+        if (!expression.isLiteral()) {
+            throw new AnalysisException("Partition value must be literal: " + 
expression.toSql());
+        }
+        String value = ((Literal) expression).getStringValue();
+        try {
+            if (targetType.isDateTimeType() || targetType.isDateTimeV2Type() 
|| targetType.isTimeStampTzType()) {
+                return convertPartitionLiteral(value, 
targetType).checkedCastTo(targetType);
+            }
+            validateCharacterLength(value, targetType);
+            LiteralExpr typedLiteral = LiteralExprUtils.createLiteral(
+                    value, targetType.toCatalogDataType());
+            return Literal.fromLegacyLiteral(typedLiteral, 
typedLiteral.getType());

Review Comment:
   **Minor note:** `validateCharacterLength` returns early for non-string 
types. After this check, `strictTypedPartitionExpression` calls 
`LiteralExprUtils.createLiteral()` for non-datetime types. If `createLiteral` 
does not independently validate string length for VARCHAR/CHAR partition 
columns, a value exceeding the column's length limit could slip through. Please 
verify that `createLiteral` performs length validation for CHAR/VARCHAR types. 
If it does, this validation is redundant but harmless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [fix](TIMESTAMPTZ)Prevents timezone-unsafe reparsing using getStringValue() [doris]

Reply via email to