github-actions[bot] commented on code in PR #63255:
URL: https://github.com/apache/doris/pull/63255#discussion_r3240926045


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/JsonLiteral.java:
##########
@@ -49,8 +49,31 @@ public JsonLiteral(String value) {
         }
         if (jsonNode == null || jsonNode.isMissingNode()) {
             throw new AnalysisException("Invalid jsonb literal: ''");
+        }
+        validateNoLoneSurrogate(jsonNode);
+        this.value = jsonNode.toString();
+    }
+
+    // RFC 8259 §8.2: JSON strings must not contain lone UTF-16 surrogates.
+    // Jackson accepts them by default, so we validate after parsing.
+    private static void validateNoLoneSurrogate(JsonNode node) {
+        if (node.isTextual()) {
+            String s = node.textValue();
+            for (int i = 0; i < s.length(); i++) {
+                char c = s.charAt(i);
+                if (Character.isHighSurrogate(c)) {
+                    if (i + 1 >= s.length() || 
!Character.isLowSurrogate(s.charAt(i + 1))) {
+                        throw new AnalysisException(
+                                "Invalid jsonb literal: JSON string contains 
lone high surrogate");
+                    }
+                    i++; // skip the paired low surrogate
+                } else if (Character.isLowSurrogate(c)) {
+                    throw new AnalysisException(
+                            "Invalid jsonb literal: JSON string contains lone 
low surrogate");
+                }
+            }
         } else {
-            this.value = jsonNode.toString();
+            node.forEach(JsonLiteral::validateNoLoneSurrogate);
         }

Review Comment:
   This still lets invalid JSON strings through when the lone surrogate is in 
an object field name, e.g. `{"\uD800":"v"}`. RFC 8259 string rules apply to 
member names as well as values, but `JsonNode.forEach(...)` only visits object 
values, so the field name is never checked before `jsonNode.toString()` is 
stored. The Gson helper in the legacy `analysis.JsonLiteral` has the same gap 
because it iterates `entrySet()` and validates only `entry.getValue()`. Please 
validate object field names in both implementations and add tests for invalid 
keys.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to