tustvold commented on issue #2387:
URL: https://github.com/apache/arrow-rs/issues/2387#issuecomment-1223861704
I'm really struggling to review the decimal array work without an answer to
this, as the current approach is wildly inconsistent to the point of being
effectively meaningless... As it isn't consistently enforced, there is no
reliable way to elide validation as you cannot assume validation has actually
been performed
* from_iter - no validation
* with_precision_and_scale - validation if precision less
* DecimalBuilder - disable validation with safe API
The result is that **it is perfectly possible to construct a DecimalArray
with overflowed values using safe APIs,** consequently it is unclear how you
can then meaningfully elide validation, or reason about if an operation will
overflow, as you can't guarantee the input data wasn't already overflowed... It
is also unclear what the meaning of validation is...
For example, if you use `from_iter` and then `with_precision_and_scale` this
will elide the validation if the new precision is greater than the default,
despite the validation of the default never occurring...
I can see 3 possible paths forward:
**Strict Validation**
* Invariant that no values in a DecimalArray exceed the precision
* Aways validate on construction, and mutation
This is the most "correct" interpretation, but has significant performance
drawbacks, and is much stronger than the guarantees we provide for other
arithmetic types.
**Loose Validation**
* Mutation operations on DecimalArray saturate at the bounds of the
underlying stored type
* Continue to provide APIs to validate no values have overflowed
* Perfectly legal for a DecimalArray to contain overflowed values, and it
must not cause UB for this to be the case
This would allow opt-in overflow detection, but would require the user to
perform validation after every arithmetic operation
**No Validation**
Continue to provide APIs to validate all values are within bounds, but don't
attempt to detect or handle overflow within kernels. This is consistent with
how we handle overflow elsewhere, is perfectly well defined.
> I think that is why the things become complicated. We may need to validate
decimal values at some moments
My 2 cents on this is we only need to validate the invariants that arrow-rs
needs to in order to prevent UB. If some other software component has
additional invariants, it is on that component to verify them. Provided we
clearly document the invariants we uphold, I don't think this is an issue.
My preference would therefore be to option 3, as it is the simplest and
fastest to implement, and is consistent with how we handle overflow elsewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]