gruuya commented on code in PR #8700:
URL: https://github.com/apache/arrow-rs/pull/8700#discussion_r2689449683
##########
arrow-cast/src/parse.rs:
##########
@@ -741,72 +741,53 @@ fn parse_e_notation<T: DecimalType>(
let mut exp: i16 = 0;
let base = T::Native::usize_as(10);
- let mut exp_start: bool = false;
// e has a plus sign
let mut pos_shift_direction: bool = true;
- // skip to point or exponent index
- let mut bs;
- if fractionals > 0 {
- // it's a fraction, so the point index needs to be skipped, so +1
- bs = s.as_bytes().iter().skip(index + fractionals as usize + 1);
- } else {
- // it's actually an integer that is already written into the result,
so let's skip on to e
- bs = s.as_bytes().iter().skip(index);
- }
+ // skip to the exponent index directly or just after any processed
fractionals
+ let mut bs = s.as_bytes().iter().skip(index + fractionals as usize);
- while let Some(b) = bs.next() {
+ // complete parsing of any unprocessed fractionals up to the exponent
Review Comment:
That is correct—this is indeed (more than) slightly confusing, and it stems
from unclear separation of concerns. We skip parsing any fractionals after
scale in `parse_decimal`, _unless_ we hit into the e-notation and realise we
may actually need those fractionals after all, in which case we parse them in
`parse_e_notation`.
This is also one more argument for going with the above suggestion:
> To fix this i think we need to s.split_once(['e', 'E']) first, then
optionally process the exponent, and only then move on to the mantissa, since
then we know exactly what numbers can be thrown away. This however goes beyond
the scope of this PR/issue I think, and probably requires more discussion
(especially around performance implications).
Besides fixing another category of edge cases, this would also simplify the
logic. I'm only worried about performance though, since that would change the
current one-pass into a two-pass algorithm.
For now I'll try to elaborate in the comment a bit more. I'll also open an
issue for a cleaner `parse_decimal`/`parse_e_notation` separation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]