nicklan commented on code in PR #7721: URL: https://github.com/apache/arrow-rs/pull/7721#discussion_r2159686914
########## arrow-json/src/reader/tape.rs: ########## @@ -705,9 +705,16 @@ fn err(b: u8, ctx: &str) -> ArrowError { /// Creates a character from an UTF-16 surrogate pair fn char_from_surrogate_pair(low: u16, high: u16) -> Result<char, ArrowError> { - let n = (((high - 0xD800) as u32) << 10) | ((low - 0xDC00) as u32 + 0x1_0000); - char::from_u32(n) - .ok_or_else(|| ArrowError::JsonError(format!("Invalid UTF-16 surrogate pair {n}"))) + match (low, high) { + (0xDC00..=0xDFFF, 0xD800..=0xDBFF) => { + let n = (((high - 0xD800) as u32) << 10) | ((low - 0xDC00) as u32 + 0x1_0000); + char::from_u32(n) + .ok_or_else(|| ArrowError::JsonError(format!("Invalid UTF-16 surrogate pair {n}"))) + } + _ => Err(ArrowError::JsonError(format!( + "Invalid UTF-16 surrogate pair. High: {high:#02X}, Low: {low:#02X}" + ))), + } Review Comment: I did consider this, and it does have the benefit of being more succinct. I opted for the range check because if one of the arguments was too high the checked sub would not catch it and I wasn't able to convince myself that only valid inputs would allow `char::from_u32` to succeed. I could dig more into the calling layers to check if that's prevented there, but it still felt like more of a foot-gun than the simple range check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org