It seems that it's better to wait until Decimal256 implementation will be 
merged in master, before starting to implement low-bit Decimals. Decimal256 
should provide a reference for how we should define new Decimal types at C++ 
part.

Also, as it was mentioned in this [1] comment, the introduction of Decimal256 
should define the way of adding new Decimal types with minimum code duplication 
at type declarations. I've prepared draft PR [2] for Decimal256 branch that 
proposes a way of minimizing code duplication. They're probably more places 
where we can reduce code duplication than I've touched in this PR, but it's 
just a draft to have some base for discussion.

[1] https://github.com/apache/arrow/pull/8190#issuecomment-693191522
[2] https://github.com/apache/arrow/pull/8417


On 2020/10/09 03:15:06 Micah Kornfield wrote:
> Hi Dmitry,
> Thanks for volunteering to contribute. Note that we are in the process of
> implementing Decimal256 support already, which is currently on a separate
> branch [1] but I'm hoping to have PR sometime early next week. If we are
> proposing adding support for these lower bit-widths, I think we will likely
> need an implementation in Java (maybe someone else is interested in doing
> this on that side). For reference [2] is the discussion for adding Decimal
> 256-bit width
>
> Thanks,
> Micah
>
> [1] https://github.com/apache/arrow/tree/decimal256
> [2]
> https://mail-archives.apache.org/mod_mbox/arrow-dev/202007.mbox/%3CCAK7Z5T9VgsTmjD%3Dmdq1Sci5D0%2BVZPVKWzxCWfHErUdNjak1X9w%40mail.gmail.com%3E
>
> On Thu, Oct 8, 2020 at 2:12 PM Wes McKinney 
> <we...@gmail.com<mailto:we...@gmail.com>> wrote:
>
> > Based in what I've seen in other systems (like Apache Kudu) that
> > support at least 32/64-bit decimal, representing them with a single
> > integer value is probably the best thing (in terms of computing
> > performance, consistency with other implementations)
> >
> > I added you as a contributor on Jira so you can assign yourself issues now
> >
> > On Thu, Oct 8, 2020 at 2:33 PM Chigarev, Dmitry
> > <dm...@intel.com<mailto:dm...@intel.com>> wrote:
> > >
> > > Hi everyone, I would like to work on this JIRA ticket:
> > > https://issues.apache.org/jira/browse/ARROW-9404
> > > ([C++] Add support for Decimal16, Decimal32 and Decimal64)
> > >
> > > This will be my first experience with contributing to Arrow, so I want
> > to ask advice what approach should I use.
> > > As far as I know, currently, Arrow supports only Decimal128 and its
> > basic and primary implementations located at
> > `cpp/src/arrow/util/basic_decimal.h` and `.../util/decimal.h`. In current
> > implementation 128-bit Decimal represented by two 64-bit integers. So there
> > are several approaches that can be applied:
> > >
> > > 1. From current BasicDecimal128 class make a template class
> > BasicDecimal<bit_width>, whose `low` and `high` variable types will be
> > dependent on `bit_width` template parameter, implementation of methods also
> > will be rewritten to depend on `bit_width`. As the result, we will have
> > classes that work with `bit_width / 2`-bit integers.
> > > The disadvantage of this approach is that even when we can fit our
> > decimal in a single int value, we will still be splitting it into two
> > variables and have to do that unnecessary extra logic of handling these two
> > variables. But all Decimals will be the instances of one template class and
> > will be consistent with each other.
> > >
> > > 2. Implement new template class BasicDecimal<bit_width> and
> > Decimal<bit_width> which will work only for bit_width <= 64 (where we can
> > represent our decimal with a single `int##bit_width##_t` variable), and
> > also reimplement all methods of Decimals in this new class.
> > > But that approach makes ambiguous what Decimal is, because technically
> > Decimal64 and Decimal128 will be completely different classes which can
> > create some inconsistency between them.
> > >
> > > 3. If we have some variable that indicates a maximum bit integer, then
> > we can try to apply the following approach. Define template
> > BasicDecimal<bit_width> class whose value will be represented not by ints
> > variables, but by array of ints:
> > > ```
> > > // Pseudo-code, may be incorrect
> > > template<int width>
> > > class BasicDecimal{
> > > using int_type = IntBitWidthTypes<max(width, MAX_INT_WIDTH)>::type;
> > > int_type values[max(width / MAX_INT_WIDTH, 1)];
> > > // all of these can be computed at the compile time
> > > ....
> > > };
> > >
> > > using BasicDecimal128 = BasicDecimal<128>;
> > > using BasicDecimal64 = BasicDecimal<64>;
> > > ....
> > > ```
> > > In the result, Decimal128 will have an uint64_t array of 2 elements,
> > Decimal64 will have an uint64_t array of 1 element, Decimal32 - uint32_t
> > array of 1 element, and so on...
> > > This also allows us to define decimals of arbitrary bitness. For
> > example, Decimal256 will be represented as an array of uint64_t with 4
> > elements.
> > > The bad side of this approach is its complexity - we need to rewrite the
> > whole BasicDecimal and Decimal class.
> > >
> > > Which one of these approaches will be the correct one?
> > >
> > > P.S. I've just noticed that I'm not being able to assign JIRA tasks for
> > myself, how can I do this?
> > >
> > >
> > > --------------------------------------------------------------------
> > > Joint Stock Company Intel A/O
> > > Registered legal address: Krylatsky Hills Business Park,
> > > 17 Krylatskaya Str., Bldg 4, Moscow 121614,
> > > Russian Federation
> > >
> > > This e-mail and any attachments may contain confidential material for
> > > the sole use of the intended recipient(s). Any review or distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> >
>

--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply via email to