Re: [PR] fix: liberal parsing of zero scale decimals [arrow-rs]

via GitHub Thu, 15 Jan 2026 03:07:20 -0800


gruuya commented on code in PR #8700:
URL: https://github.com/apache/arrow-rs/pull/8700#discussion_r2693688957



##########
arrow-cast/src/parse.rs:
##########
@@ -2752,6 +2740,35 @@ mod tests {
             let result = parse_decimal::<Decimal256Type>(s, 76, scale);
             assert_eq!(i, result.unwrap());
         }
+
+        let zero_scale_tests = [
+            ("0.123", 0, 3),
+            ("1.0", 1, 3),
+            ("1.2", 1, 3),
+            ("1.00", 1, 3),
+            ("1.23", 1, 3),
+            ("1.000", 1, 3),
+            ("1.123", 1, 3),
+            ("123.0", 123, 3),
+            ("123.4", 123, 3),
+            ("123.00", 123, 3),
+            ("123.45", 123, 3),
+            ("123.000000000000000000004", 123, 3),
+            ("0.123e2", 12, 3),
+            ("0.123e4", 1230, 10),
+            ("1.23e4", 12300, 10),
+            ("12.3e4", 123000, 10),
+            ("123e4", 1230000, 10),
+            (
+                "20000000000000000000000000000000000002.0",
+                20000000000000000000000000000000000002,
+                38,
+            ),
+        ];
+        for (s, i, precision) in zero_scale_tests {
+            let result_128 = parse_decimal::<Decimal128Type>(s, precision, 
0).unwrap();
+            assert_eq!(i, result_128);
+        }

Review Comment:
   Thanks, these cases turned out to be very valuable. 
   
   For one thing `"."` case wasn't being correctly handled for zero scale 
decimals, I fixed that now by factoring out various validation points 
throughout the code into a common unsupported check at the start
   ```rust
       if matches!(s, "" | "-" | "+" | ".") {
           return Err(ArrowError::ParseError(format!(
               "can't parse the string value {s} to decimal"
           )));
       }
   ```
   I found this is a lot easier to grok, since previously these validations 
were scattered throughout `parse_decimal` and deduced implicitly based on 
side-effects (e.g. we finished parsing a decimal with a decimal point but the 
digit count is zero). Let me know if you have performance concerns about this, 
and I can resort to the old (implicit) manner on deducing unsupported strings.
   
   For another thing, the test cases `"1e"`, `"1e+"` and `"1e-"` are curious. 
Strictly speaking they aren't valid, but I wouldn't be surprised if some system 
can parse/generate these. The question is what should `parse_decimal` output 
for them. I tested these on both main and this branch:
   ```
   | input                                          | output(main)              
                                        | output(this branch) |
   
|------------------------------------------------|-------------------------------------------------------------------|---------------------|
   | parse_decimal::<Decimal128Type>("1e", 5, 0)    | Err(ParseError("can't 
parse the string value 1e to decimal"))     | Ok(1)               |
   | parse_decimal::<Decimal128Type>("1e+", 5, 0)   | Ok(1)                     
                                        | Ok(1)               |
   | parse_decimal::<Decimal128Type>("1e-", 5, 0)   | Ok(1)                     
                                        | Ok(1)               |
   | parse_decimal::<Decimal128Type>("1e", 5, 2)    | Err(ParseError("can't 
parse the string value 1e to decimal"))     | Ok(100)             |
   | parse_decimal::<Decimal128Type>("1e+", 5, 2)   | Ok(100)                   
                                        | Ok(100)             |
   | parse_decimal::<Decimal128Type>("1e-", 5, 2)   | Ok(100)                   
                                        | Ok(100)             |
   ```
   To make these consistent I think we should error out on all of them (that's 
what Postgres does), at least until there's a worthwhile precedent to parse 
some of these.
   
   So _if_ the `matches!` approach from above is fine with you, and _if_ you 
agree we should make these unsupported, then I can just append these patterns 
there. Otherwise, let me know which ones would you think it'd make sense to 
parse.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: liberal parsing of zero scale decimals [arrow-rs]

Reply via email to