nicklan commented on code in PR #7721:
URL: https://github.com/apache/arrow-rs/pull/7721#discussion_r2159686914


##########
arrow-json/src/reader/tape.rs:
##########
@@ -705,9 +705,16 @@ fn err(b: u8, ctx: &str) -> ArrowError {
 
 /// Creates a character from an UTF-16 surrogate pair
 fn char_from_surrogate_pair(low: u16, high: u16) -> Result<char, ArrowError> {
-    let n = (((high - 0xD800) as u32) << 10) | ((low - 0xDC00) as u32 + 
0x1_0000);
-    char::from_u32(n)
-        .ok_or_else(|| ArrowError::JsonError(format!("Invalid UTF-16 surrogate 
pair {n}")))
+    match (low, high) {
+        (0xDC00..=0xDFFF, 0xD800..=0xDBFF) => {
+            let n = (((high - 0xD800) as u32) << 10) | ((low - 0xDC00) as u32 
+ 0x1_0000);
+            char::from_u32(n)
+                .ok_or_else(|| ArrowError::JsonError(format!("Invalid UTF-16 
surrogate pair {n}")))
+        }
+        _ => Err(ArrowError::JsonError(format!(
+            "Invalid UTF-16 surrogate pair. High: {high:#02X}, Low: {low:#02X}"
+        ))),
+    }

Review Comment:
   I did consider this, and it does have the benefit of being more succinct. I 
opted for the range check because if one of the arguments was too high the 
checked sub would not catch it and I wasn't able to convince myself that only 
valid inputs would allow `char::from_u32` to succeed.
   
   I could dig more into the calling layers to check if that's prevented there, 
but it still felt like more of a foot-gun that the simple range check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to