github-actions[bot] commented on code in PR #63255:
URL: https://github.com/apache/doris/pull/63255#discussion_r3240926045
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/JsonLiteral.java:
##########
@@ -49,8 +49,31 @@ public JsonLiteral(String value) {
}
if (jsonNode == null || jsonNode.isMissingNode()) {
throw new AnalysisException("Invalid jsonb literal: ''");
+ }
+ validateNoLoneSurrogate(jsonNode);
+ this.value = jsonNode.toString();
+ }
+
+ // RFC 8259 §8.2: JSON strings must not contain lone UTF-16 surrogates.
+ // Jackson accepts them by default, so we validate after parsing.
+ private static void validateNoLoneSurrogate(JsonNode node) {
+ if (node.isTextual()) {
+ String s = node.textValue();
+ for (int i = 0; i < s.length(); i++) {
+ char c = s.charAt(i);
+ if (Character.isHighSurrogate(c)) {
+ if (i + 1 >= s.length() ||
!Character.isLowSurrogate(s.charAt(i + 1))) {
+ throw new AnalysisException(
+ "Invalid jsonb literal: JSON string contains
lone high surrogate");
+ }
+ i++; // skip the paired low surrogate
+ } else if (Character.isLowSurrogate(c)) {
+ throw new AnalysisException(
+ "Invalid jsonb literal: JSON string contains lone
low surrogate");
+ }
+ }
} else {
- this.value = jsonNode.toString();
+ node.forEach(JsonLiteral::validateNoLoneSurrogate);
}
Review Comment:
This still lets invalid JSON strings through when the lone surrogate is in
an object field name, e.g. `{"\uD800":"v"}`. RFC 8259 string rules apply to
member names as well as values, but `JsonNode.forEach(...)` only visits object
values, so the field name is never checked before `jsonNode.toString()` is
stored. The Gson helper in the legacy `analysis.JsonLiteral` has the same gap
because it iterates `entrySet()` and validates only `entry.getValue()`. Please
validate object field names in both implementations and add tests for invalid
keys.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]