morrySnow opened a new pull request, #63255: URL: https://github.com/apache/doris/pull/63255
## Summary Fixes DORIS-25576. Nereids `JsonLiteral` and legacy `analysis.JsonLiteral` silently accepted lone UTF-16 surrogates (e.g. `'"\uD800"'::JSONB`) because Jackson and Gson both parse such inputs without error by default. RFC 8259 §8.2 explicitly forbids unpaired surrogates in JSON strings. Silent acceptance causes data-correctness issues: the invalid value is stored in BE and surfaces as errors only during export, cross-system transfer, or UTF-8 serialization. ## What problem does this PR solve? Issue Number: close #DORIS-25576 Problem Summary: Add a recursive `validateNoLoneSurrogate` post-parse walk in both `JsonLiteral` constructors that throws `AnalysisException` immediately for any string node containing a lone high or low surrogate. ### Changes - `fe/fe-core/.../nereids/.../JsonLiteral.java`: add `validateNoLoneSurrogate(JsonNode)` called after Jackson parsing - `fe/fe-catalog/.../analysis/JsonLiteral.java`: add `validateNoLoneSurrogate(JsonElement)` called after Gson parsing - `fe/fe-core/src/test/.../JsonLiteralTest.java`: unit tests covering lone-high, lone-low, nested, and valid surrogate-pair cases ## Release note JSONB literal expressions now reject strings containing lone UTF-16 surrogates (e.g. `'"\uD800"'::JSONB`) with an AnalysisException, conforming to RFC 8259 §8.2. ## Check List (For Author) - Test: Unit Test (`JsonLiteralTest` — lone-surrogate rejection + valid surrogate-pair acceptance) - Behavior changed: Yes — lone surrogates in JSONB literals now throw AnalysisException instead of being silently accepted - Does this need documentation: No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
