Re: [PR] fix: Panic in Spark's `format_string` for illegal characters [datafusion]

via GitHub Mon, 11 May 2026 02:50:41 -0700


kosiew commented on code in PR #22077:
URL: https://github.com/apache/datafusion/pull/22077#discussion_r3217840394



##########
datafusion/spark/src/function/string/format_string.rs:
##########
@@ -861,6 +861,37 @@ fn take_numeric_param(s: &str, zero: bool) -> 
(NumericParam, &str) {
     }
 }
 
+/// Convert a `u32` to a [`char`] for the `%c` conversion, returning a SQL
+/// error if the value is not a valid Unicode scalar value (i.e. is in the
+/// surrogate range `0xD800..=0xDFFF` or above `0x10FFFF`). Java's `Formatter`

Review Comment:
   This is accurate that the behavior lines up with Java's Formatter and 
Character.isValidCodePoint, since both reject surrogates.
   
   I think the comment would be a little clearer if it led with the Rust and 
Unicode reason: surrogate values are not valid Unicode scalar values, so 
char::from_u32() rejects them.
   
   Suggested wording:
   
   "Surrogate values (0xD800 to 0xDFFF) and values above 0x10FFFF are not valid 
Unicode scalar values and are rejected, consistent with Java's code point 
validation."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Panic in Spark's `format_string` for illegal characters [datafusion]

Reply via email to