Re: [PR] fix: support full-width and null characters, and negative scale in string to decimal [datafusion-comet]

via GitHub Mon, 13 Apr 2026 11:00:16 -0700


parthchandra commented on code in PR #3922:
URL: https://github.com/apache/datafusion-comet/pull/3922#discussion_r3074861178



##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -446,16 +480,30 @@ fn parse_string_to_decimal(input_str: &str, precision: 
u8, scale: i8) -> SparkRe
     let mut start = 0;
     let mut end = string_bytes.len();
 
-    // trim whitespaces
-    while start < end && string_bytes[start].is_ascii_whitespace() {
+    // Trim ASCII whitespace and null bytes from both ends. Spark's UTF8String
+    // trims null bytes the same way it trims whitespace: "123\u0000" and
+    // "\u0000123" both parse as 123. Null bytes in the middle are not trimmed
+    // and will fail the digit validation in parse_decimal_str, producing NULL.
+    while start < end && (string_bytes[start].is_ascii_whitespace() || 
string_bytes[start] == 0) {
         start += 1;
     }
-    while end > start && string_bytes[end - 1].is_ascii_whitespace() {
+    while end > start && (string_bytes[end - 1].is_ascii_whitespace() || 
string_bytes[end - 1] == 0)
+    {
         end -= 1;
     }
 
     let trimmed = &input_str[start..end];
 
+    // Normalize fullwidth digits to ASCII. Fast path skips the allocation for
+    // pure-ASCII strings, which is the common case.
+    let normalized;
+    let trimmed = if trimmed.bytes().any(|b| b > 0x7F) {

Review Comment:
   The previous loop only moves the beginning and end, so never loop over the 
non-whitespace characters. This loop passes over the middle part. Doubt if this 
can be improved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: support full-width and null characters, and negative scale in string to decimal [datafusion-comet]

Reply via email to