tustvold commented on issue #2637: URL: https://github.com/apache/arrow-rs/issues/2637#issuecomment-1256952040
I had a play around implementing this and it appears to be very tractable. What I did is as follows: * Define `struct i256([u8; 32])` and implement `ArrowNativeType` for it * Implement `ArrowPrimitiveType` for `Decimal128Type` and `Decimal256Type` with `Native` of `i128` and `i256` respectively * Replace DecimalArray and DecimalBuilder with PrimitiveArray and PrimitiveBuilder * Add specialized `impl PrimitiveArray<T: DecimalType>` adding `with_precision_and_scale` * Add specialized `impl PrimitiveBuilder<T: DecimalType>` adding `with_precision_and_scale` * Add `ArrayData::validate_decimal_values` which validates values are within bounds of decimal precision * Remove old decimal logic, including specialized take, comparison kernels, etc... * Add public free function to format signed little endian byte arrays with a given precision and scale * Implement ArrowNativeOps for `i128` and `i256` This appears to work quite well, and I'm fairly happy this will achieve the stated aims of massively simplifying the decimal implementation, making it more consistent with other types such as timestamps, and provide an easy path forward to getting full kernel coverage. There are, however, a couple of major implications of this change: * The values returned from `PrimitiveArray::value` will be the storage type, and **not** a semantic `Decimal` type. This is the same as timestamps, times, durations, etc... and I think is therefore fine, but is worth highlighting * Precision will **not** be validated by default, instead only via explicit calls to `validate_decimal_values` https://github.com/apache/arrow-rs/issues/2387. This makes the builders, `FromIterator` etc... consistently infallible (currently this is inconsistent) * Overflow for checked kernels will be at the boundary of the storage type, i.e. `i128` or `i256` and not based on precision, users who wish to catch precision-based overflow will need to use the checked calls and then make an explicit call to validate the result * Arithmetic across arrays with different scale will be non-trivial (although it would be complicated regardless) If people are happy with this path forward, I will polish up this approach over the coming week, but I'm loathe to work on this further if the consensus isn't there. Thoughts @alamb @liukun4515 @HaoYang670 @kmitchener? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
