[GitHub] [arrow-rs] tustvold commented on issue #2387: Decimal Precision Validation

GitBox Tue, 23 Aug 2022 03:16:39 -0700


tustvold commented on issue #2387:
URL: https://github.com/apache/arrow-rs/issues/2387#issuecomment-1223861704


   I'm really struggling to review the decimal array work without an answer to 
this, as the current approach is wildly inconsistent to the point of being 
effectively meaningless... As it isn't consistently enforced, there is no 
reliable way to elide validation as you cannot assume validation has actually 
been performed
   
   * from_iter - no validation
   * with_precision_and_scale - validation if precision less
   * DecimalBuilder - disable validation with safe API
   
   The result is that **it is perfectly possible to construct a DecimalArray 
with overflowed values using safe APIs,** consequently it is unclear how you 
can then meaningfully elide validation, or reason about if an operation will 
overflow, as you can't guarantee the input data wasn't already overflowed... It 
is also unclear what the meaning of validation is...
   
   For example, if you use `from_iter` and then `with_precision_and_scale` this 
will elide the validation if the new precision is greater than the default, 
despite the validation of the default never occurring...
   
   I can see 3 possible paths forward:
   
   **Strict Validation**
   
   * Invariant that no values in a DecimalArray exceed the precision
   * Aways validate on construction, and mutation
    
   This is the most "correct" interpretation, but has significant performance 
drawbacks, and is much stronger than the guarantees we provide for other 
arithmetic types.
   
   **Loose Validation**
   
   * Mutation operations on DecimalArray saturate at the bounds of the 
underlying stored type
   * Continue to provide APIs to validate no values have overflowed
   * Perfectly legal for a DecimalArray to contain overflowed values, and it 
must not cause UB for this to be the case
   
   This would allow opt-in overflow detection, but would require the user to 
perform validation after every arithmetic operation
   
   **No Validation**
   
   Continue to provide APIs to validate all values are within bounds, but don't 
attempt to detect or handle overflow within kernels. This is consistent with 
how we handle overflow elsewhere, is perfectly well defined.
   
   > I think that is why the things become complicated. We may need to validate 
decimal values at some moments
   
   My 2 cents on this is we only need to validate the invariants that arrow-rs 
needs to in order to prevent UB. If some other software component has 
additional invariants, it is on that component to verify them. Provided we 
clearly document the invariants we uphold, I don't think this is an issue.
   
   My preference would therefore be to option 3, as it is the simplest and 
fastest to implement, and is consistent with how we handle overflow elsewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on issue #2387: Decimal Precision Validation

Reply via email to