Hi everyone, I would like to work on this JIRA ticket:
https://issues.apache.org/jira/browse/ARROW-9404
([C++] Add support for Decimal16, Decimal32 and Decimal64)
This will be my first experience with contributing to Arrow, so I want to ask
advice what approach should I use.
As far as I know, currently, Arrow supports only Decimal128 and its basic and
primary implementations located at `cpp/src/arrow/util/basic_decimal.h` and
`.../util/decimal.h`. In current implementation 128-bit Decimal represented by
two 64-bit integers. So there are several approaches that can be applied:
1. From current BasicDecimal128 class make a template class
BasicDecimal<bit_width>, whose `low` and `high` variable types will be
dependent on `bit_width` template parameter, implementation of methods also
will be rewritten to depend on `bit_width`. As the result, we will have classes
that work with `bit_width / 2`-bit integers.
The disadvantage of this approach is that even when we can fit our decimal in a
single int value, we will still be splitting it into two variables and have to
do that unnecessary extra logic of handling these two variables. But all
Decimals will be the instances of one template class and will be consistent
with each other.
2. Implement new template class BasicDecimal<bit_width> and Decimal<bit_width>
which will work only for bit_width <= 64 (where we can represent our decimal
with a single `int##bit_width##_t` variable), and also reimplement all methods
of Decimals in this new class.
But that approach makes ambiguous what Decimal is, because technically
Decimal64 and Decimal128 will be completely different classes which can create
some inconsistency between them.
3. If we have some variable that indicates a maximum bit integer, then we can
try to apply the following approach. Define template BasicDecimal<bit_width>
class whose value will be represented not by ints variables, but by array of
ints:
```
// Pseudo-code, may be incorrect
template<int width>
class BasicDecimal{
using int_type = IntBitWidthTypes<max(width, MAX_INT_WIDTH)>::type;
int_type values[max(width / MAX_INT_WIDTH, 1)];
// all of these can be computed at the compile time
....
};
using BasicDecimal128 = BasicDecimal<128>;
using BasicDecimal64 = BasicDecimal<64>;
....
```
In the result, Decimal128 will have an uint64_t array of 2 elements, Decimal64
will have an uint64_t array of 1 element, Decimal32 - uint32_t array of 1
element, and so on...
This also allows us to define decimals of arbitrary bitness. For example,
Decimal256 will be represented as an array of uint64_t with 4 elements.
The bad side of this approach is its complexity - we need to rewrite the whole
BasicDecimal and Decimal class.
Which one of these approaches will be the correct one?
P.S. I've just noticed that I'm not being able to assign JIRA tasks for myself,
how can I do this?
--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.