Based in what I've seen in other systems (like Apache Kudu) that support at least 32/64-bit decimal, representing them with a single integer value is probably the best thing (in terms of computing performance, consistency with other implementations)
I added you as a contributor on Jira so you can assign yourself issues now On Thu, Oct 8, 2020 at 2:33 PM Chigarev, Dmitry <dmitry.chiga...@intel.com> wrote: > > Hi everyone, I would like to work on this JIRA ticket: > https://issues.apache.org/jira/browse/ARROW-9404 > ([C++] Add support for Decimal16, Decimal32 and Decimal64) > > This will be my first experience with contributing to Arrow, so I want to ask > advice what approach should I use. > As far as I know, currently, Arrow supports only Decimal128 and its basic and > primary implementations located at `cpp/src/arrow/util/basic_decimal.h` and > `.../util/decimal.h`. In current implementation 128-bit Decimal represented > by two 64-bit integers. So there are several approaches that can be applied: > > 1. From current BasicDecimal128 class make a template class > BasicDecimal<bit_width>, whose `low` and `high` variable types will be > dependent on `bit_width` template parameter, implementation of methods also > will be rewritten to depend on `bit_width`. As the result, we will have > classes that work with `bit_width / 2`-bit integers. > The disadvantage of this approach is that even when we can fit our decimal in > a single int value, we will still be splitting it into two variables and have > to do that unnecessary extra logic of handling these two variables. But all > Decimals will be the instances of one template class and will be consistent > with each other. > > 2. Implement new template class BasicDecimal<bit_width> and > Decimal<bit_width> which will work only for bit_width <= 64 (where we can > represent our decimal with a single `int##bit_width##_t` variable), and also > reimplement all methods of Decimals in this new class. > But that approach makes ambiguous what Decimal is, because technically > Decimal64 and Decimal128 will be completely different classes which can > create some inconsistency between them. > > 3. If we have some variable that indicates a maximum bit integer, then we can > try to apply the following approach. Define template BasicDecimal<bit_width> > class whose value will be represented not by ints variables, but by array of > ints: > ``` > // Pseudo-code, may be incorrect > template<int width> > class BasicDecimal{ > using int_type = IntBitWidthTypes<max(width, MAX_INT_WIDTH)>::type; > int_type values[max(width / MAX_INT_WIDTH, 1)]; > // all of these can be computed at the compile time > .... > }; > > using BasicDecimal128 = BasicDecimal<128>; > using BasicDecimal64 = BasicDecimal<64>; > .... > ``` > In the result, Decimal128 will have an uint64_t array of 2 elements, > Decimal64 will have an uint64_t array of 1 element, Decimal32 - uint32_t > array of 1 element, and so on... > This also allows us to define decimals of arbitrary bitness. For example, > Decimal256 will be represented as an array of uint64_t with 4 elements. > The bad side of this approach is its complexity - we need to rewrite the > whole BasicDecimal and Decimal class. > > Which one of these approaches will be the correct one? > > P.S. I've just noticed that I'm not being able to assign JIRA tasks for > myself, how can I do this? > > > -------------------------------------------------------------------- > Joint Stock Company Intel A/O > Registered legal address: Krylatsky Hills Business Park, > 17 Krylatskaya Str., Bldg 4, Moscow 121614, > Russian Federation > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.