Re: [PR] fix: liberal parsing of zero scale decimals [arrow-rs]

via GitHub Wed, 14 Jan 2026 07:32:36 -0800


gruuya commented on code in PR #8700:
URL: https://github.com/apache/arrow-rs/pull/8700#discussion_r2689449683



##########
arrow-cast/src/parse.rs:
##########
@@ -741,72 +741,53 @@ fn parse_e_notation<T: DecimalType>(
     let mut exp: i16 = 0;
     let base = T::Native::usize_as(10);
 
-    let mut exp_start: bool = false;
     // e has a plus sign
     let mut pos_shift_direction: bool = true;
 
-    // skip to point or exponent index
-    let mut bs;
-    if fractionals > 0 {
-        // it's a fraction, so the point index needs to be skipped, so +1
-        bs = s.as_bytes().iter().skip(index + fractionals as usize + 1);
-    } else {
-        // it's actually an integer that is already written into the result, 
so let's skip on to e
-        bs = s.as_bytes().iter().skip(index);
-    }
+    // skip to the exponent index directly or just after any processed 
fractionals
+    let mut bs = s.as_bytes().iter().skip(index + fractionals as usize);
 
-    while let Some(b) = bs.next() {
+    // complete parsing of any unprocessed fractionals up to the exponent

Review Comment:
   That is correct—this is indeed (more than) slightly confusing, and it stems 
from unclear separation of concerns. We skip parsing any fractionals after 
scale in `parse_decimal`, _unless_ we hit into the e-notation and realise we 
may actually need those fractionals after all, in which case we parse them in 
`parse_e_notation`.
   
   This is also one more argument for going with the above suggestion:
   
   > To fix this i think we need to s.split_once(['e', 'E']) first, then 
optionally process the exponent, and only then move on to the mantissa, since 
then we know exactly what numbers can be thrown away. This however goes beyond 
the scope of this PR/issue I think, and probably requires more discussion 
(especially around performance implications).
   
   Besides fixing another category of edge cases, this would also simplify the 
logic. I'm only worried about performance though, since that would change the 
current one-pass into a two-pass algorithm.
   
   For now I'll try to elaborate in the comment a bit more. I'll also open an 
issue for a cleaner `parse_decimal`/`parse_e_notation` separation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: liberal parsing of zero scale decimals [arrow-rs]

Reply via email to