[GitHub] [arrow-rs] liukun4515 commented on issue #2387: Decimal Precision Validation

GitBox Tue, 23 Aug 2022 05:32:25 -0700


liukun4515 commented on issue #2387:
URL: https://github.com/apache/arrow-rs/issues/2387#issuecomment-1224004677


   > I'm really struggling to review the decimal array work without an answer 
to this, as the current approach is wildly inconsistent to the point of being 
effectively meaningless... As it isn't consistently enforced, there is no 
reliable way to elide validation as you cannot assume validation has actually 
been performed
   > 
   > * from_iter - no validation
   > * with_precision_and_scale - validation if precision less
   > * DecimalBuilder - disable validation with safe API
   > 
   > The result is that **it is perfectly possible to construct a DecimalArray 
with overflowed values using safe APIs,** consequently it is unclear how you 
can then meaningfully elide validation, or reason about if an operation will 
overflow, as you can't guarantee the input data wasn't already overflowed... It 
is also unclear what the meaning of validation is...
   > 
   > For example, if you use `from_iter` to `collect` an iterator into a 
`DecimalArray` and then use `with_precision_and_scale` to set the precision and 
scale, this will elide the validation if the new precision is greater than the 
default, despite the validation never occurring...
   > 
   > I can see 3 possible paths forward:
   > 
   > **Strict Validation**
   > 
   > * Invariant that no values in a DecimalArray exceed the precision
   > * Aways validate on construction, and mutation
   > 
   > This is the most "correct" interpretation, but has significant performance 
drawbacks, and is much stronger than the guarantees we provide for other 
arithmetic types.
   > 
   > **Loose Validation**
   > 
   > * Mutation operations on DecimalArray saturate at the bounds of the 
underlying stored type
   > * Continue to provide APIs to validate no values have overflowed
   > * Perfectly legal for a DecimalArray to contain overflowed values, and it 
must not cause UB for this to be the case
   > 
   > This would allow opt-in overflow detection, but would require the user to 
perform validation after every arithmetic operation
   > 
   > **No Validation**
   > 
   > Continue to provide APIs to validate all values are within bounds, but 
don't attempt to detect or handle overflow within kernels. This is consistent 
with how we handle overflow elsewhere and is perfectly well defined.
   > 
   > > I think that is why the things become complicated. We may need to 
validate decimal values at some moments
   > 
   > My 2 cents on this is we only need to validate the invariants that 
arrow-rs needs to in order to prevent UB. If some other software component has 
additional invariants, it is on that component to verify them. Provided we 
clearly document the invariants we uphold, I don't think this is an issue.
   > 
   > My preference would therefore be to option 3, as it is the simplest and 
fastest to implement, and is consistent with how we handle overflow elsewhere.
   
   In my opinion, the reason of doing validation for the decimal array is that 
we would like to make sure all the elements in the array are the range for the 
precision.
   If we don't care about the if the element in the array is out of bounds, 
then we don't need to do validation in the kernel.
   
   For example, I call the `Cast` method to cast `int32 array` to `int8 array`, 
do we need to do validation and check the overflow for the casting? If we need 
to do the check and return the error message to user, we need to do validation 
in the kernel api.
   
   But as far as Decimal data type, If I cast the `Decimal(8,0) array) to 
`Decimal(12,0) array`, each element can't be overflow or out of the range for 
the new precision, we don't need to do validation and improve the performance.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] liukun4515 commented on issue #2387: Decimal Precision Validation

Reply via email to