[GitHub] [arrow-rs] liukun4515 commented on issue #2637: Replace DecimalArray with PrimitiveArray

GitBox Sun, 25 Sep 2022 17:26:35 -0700


liukun4515 commented on issue #2637:
URL: https://github.com/apache/arrow-rs/issues/2637#issuecomment-1257323227


   > I had a play around implementing this and it appears to be very tractable. 
What I did is as follows:
   > 
   > * Define `struct i256([u8; 32])` and implement `ArrowNativeType` for it
   
   In order to do unified processing, I suggest that use the [u8;16] to present 
the i128 like the i256.
   
   We can implement the operation like comparation, arithmetic operation based 
on the u8 array.
   
   > * Implement `ArrowPrimitiveType` for `Decimal128Type` and `Decimal256Type` 
with `Native` of `i128` and `i256` respectively
   
   Using the `i128([u8;16])` and `i128([u8;32])` as the `Native` is good point, 
My concern is the `const DATA_TYPE: DataType` for this.
   
   The decimal data type is infinite and can't be enumerated. We can't 
enumerate all decimal type like time stamp with limited time unit.
   
   How to determine the `Data type` for this ArrowPrimitiveType? 
   
   This is key point for me.
   
   > * Replace DecimalArray and DecimalBuilder with PrimitiveArray and 
PrimitiveBuilder
   > * Add specialized `impl PrimitiveArray<T: DecimalType>` adding 
`with_precision_and_scale`
   > * Add specialized `impl PrimitiveBuilder<T: DecimalType>` adding 
`with_precision_and_scale`
   > * Add `ArrayData::validate_decimal_values` which validates values are 
within bounds of decimal precision
   > * Remove old decimal logic, including specialized take, comparison 
kernels, etc...
   > * Add public free function to format signed little endian byte arrays with 
a given precision and scale
   > * Implement ArrowNativeOps for `i128` and `i256`
   > * Casting precision becomes purely a metadata operation, casting scale can 
use the normal arithmetic operations
   > 
   > This appears to work quite well, and I'm fairly happy this will achieve 
the stated aims of massively simplifying the decimal implementation, making it 
more consistent with other types such as timestamps, and provide an easy path 
forward to getting full kernel coverage.
   > 
   > There are, however, a couple of major implications of this change:
   > 
   > * The values returned from `PrimitiveArray::value` will be the storage 
type, and **not** a semantic `Decimal` type. This is the same as timestamps, 
times, durations, etc... and I think is therefore fine, but is worth 
highlighting
   > * Precision will **not** be validated by default, instead only via 
explicit calls to `validate_decimal_values` [Decimal Precision Validation 
#2387](https://github.com/apache/arrow-rs/issues/2387). This makes the 
builders, `FromIterator` etc... consistently infallible (currently this is 
inconsistent)
   > * Overflow for checked kernels will be at the boundary of the storage 
type, i.e. `i128` or `i256` and not based on precision, users who wish to catch 
precision-based overflow will need to use the checked calls and then make an 
explicit call to validate the result
   > * Arithmetic, comparison, etc... across arrays with different scale will 
be non-trivial (although it would be complicated regardless)
   
   I think we don't need to consider operation with different scale.
   We should add checker for this with the assumptions.
   
   If the `native` for decimal has its data type(precision, scale), is this can 
be resolved？
   
   > 
   > If people are happy with this path forward, I will polish up this approach 
over the coming week, but I'm loathe to work on this further if the consensus 
isn't there. Even better would be if someone is willing to help out with this 
effort 😄?
   > 
   > Thoughts @alamb @liukun4515 @HaoYang670 @kmitchener @viirya?
   
   I personally still want to implement decimal data type alone and don't 
combine it with all primitive data type.
   But this thought is also what I have thought before, I basically agree with 
your proposal with some comments and questions.
   
   @tustvold 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] liukun4515 commented on issue #2637: Replace DecimalArray with PrimitiveArray

Reply via email to