tustvold commented on issue #2637:
URL: https://github.com/apache/arrow-rs/issues/2637#issuecomment-1256952040

   I had a play around implementing this and it appears to be very tractable. 
What I did is as follows:
   
   * Define `struct i256([u8; 32])` and implement `ArrowNativeType` for it
   * Implement `ArrowPrimitiveType` for `Decimal128Type` and `Decimal256Type` 
with `Native` of `i128` and `i256` respectively
   * Replace DecimalArray and DecimalBuilder with PrimitiveArray and 
PrimitiveBuilder
   * Add specialized `impl PrimitiveArray<T: DecimalType>` adding 
`with_precision_and_scale`
   * Add specialized `impl PrimitiveBuilder<T: DecimalType>` adding 
`with_precision_and_scale`
   * Add `ArrayData::validate_decimal_values` which validates values are within 
bounds of decimal precision
   * Remove old decimal logic, including specialized take, comparison kernels, 
etc...
   * Add public free function to format signed little endian byte arrays with a 
given precision and scale 
   * Implement ArrowNativeOps for `i128` and `i256`
   
   This appears to work quite well, and I'm fairly happy this will achieve the 
stated aims of massively simplifying the decimal implementation, making it more 
consistent with other types such as timestamps, and provide an easy path 
forward to getting full kernel coverage.
   
   There are, however, a couple of major implications of this change:
   
   * The values returned from `PrimitiveArray::value` will be the storage type, 
and **not** a semantic `Decimal` type. This is the same as timestamps, times, 
durations, etc... and I think is therefore fine, but is worth highlighting
   * Precision will **not** be validated by default, instead only via explicit 
calls to `validate_decimal_values` 
https://github.com/apache/arrow-rs/issues/2387. This makes the builders, 
`FromIterator` etc... consistently infallible (currently this is inconsistent)
   * Overflow for checked kernels will be at the boundary of the storage type, 
i.e. `i128` or `i256` and not based on precision, users who wish to catch 
precision-based overflow will need to use the checked calls and then make an 
explicit call to validate the result
   * Arithmetic across arrays with different scale will be non-trivial 
(although it would be complicated regardless)
   
   If people are happy with this path forward, I will polish up this approach 
over the coming week, but I'm loathe to work on this further if the consensus 
isn't there.
   
   Thoughts @alamb @liukun4515 @HaoYang670 @kmitchener?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to