gruuya opened a new issue, #8699:
URL: https://github.com/apache/arrow-rs/issues/8699

   **Describe the bug**
   Decimal numbers with zero scale and some fractional digits in the input 
string are being incorrectly/inconsistently parsed. For instance parsing 
`"1.0"` with precision 3 and scale 0 will return `10` (instead of `1`), and 
parsing `"123.0"` with precision 3 and scale 0 will panic with `parse decimal 
overflow (123.0)` (instead of returning `123`).
   
   **To Reproduce**
   Add this set of test cases to `test_parse_decimal_with_parameter` in 
arrow-cast
   ```rust
           let zero_scale_tests = [
               ("1.0", 1),      // 10
               ("1.2", 1),      // 12
               ("1.00", 1),     // 100
               ("1.23", 1),     // 123
               ("1.000", 1),    // "parse decimal overflow (1.000)"
               ("1.123", 1),    // "parse decimal overflow (1.123)"
               ("123.0", 123),  // "parse decimal overflow (123.0)"
               ("123.4", 123),  // "parse decimal overflow (123.4)"
               ("123.00", 123), // "parse decimal overflow (123.00)"
               ("123.45", 123), // "parse decimal overflow (123.45)"
           ];
           for (s, i) in zero_scale_tests {
               let result_128 = parse_decimal::<Decimal128Type>(s, 3, 
0).unwrap();
               assert_eq!(i, result_128);
           }
   ```
   
   **Expected behavior**
   None of the above cases returns the (in some cases maybe disputably) 
expected results. The comment besides each case shows what the returned 
value/err is.
   
   Depending on how liberal/flexible `parse_decimal` is intended to be, I can 
see there being a couple of modalities when it comes to expected behavior of 
above cases (i.e. when the scale is zero)
   1. only numbers with a single zero in the decimal part can be parsed, others 
must error out
   2. only numbers with zeros in the decimal part can be parsed, others must 
error out
   3. all above numbers can be parsed
   
   Given that `parse_decimal` already does some truncation 
https://github.com/apache/arrow-rs/blob/d519bb800340fa1a5e2601ae51cba82be3a7aa4b/arrow-cast/src/parse.rs#L2529
 I'm inclined to think the last modality might be fine
   
   **Additional context**
   One way this problem is manifested is when parsing min/max values in Delta 
tables. We've ran into a case where a decimal/numeric type with scale 0 has one 
of those values set as `"_.0"`, which in turn crashes the json reader when 
trying to parse the log file, so the table ends up broken. (cc @ion-elgreco, it 
is also somewhat unfortunate that delta-rs picked f64 to 
[serialize](https://github.com/delta-io/delta-rs/blob/acfaee5c07e47e9dadb3bbb5201ab75b245bf9b1/crates/core/src/writer/stats.rs#L247)
 the decimal type)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to