Dear all,

We want to support a feature for conversions between delta vector and
partial sum vector. Please give your valuable feedback.

Best,

Liya Fan

What is a delta vector/partial sum vector?

Given an integer vector a with length n, its partial sum vector is another
integer vector b with length n + 1, with values defined as:

b(0) = initial sum
b(i ) = a(0) + a(1) + ... + a(i - 1) i = 1, 2, ..., n

Given an integer vector with length n + 1, its delta vector is another
integer vector b with length n, with values defined as:

b(i ) = a(i ) - a(i - 1), i = 0, 1, ... , n -1

In this issue, we provide utilities to convert between vector and partial
sum vector. It is interesting to note that the two operations corresponding
to the discrete integration and differentian.

These conversions have wide applications. For example,

   1.

   The run-length vector proposed by Micah is based on the partial sum
   vector, while the deduplication functionality is based on delta vector.
   This issue provides conversions between them.
   2.

   The current VarCharVector/VarBinaryVector implementations are based on
   partial sum vector. We can transform them to delta vectors before IPC, to
   reduce network traffic.
   3.

   Converting to delta can be considered as a way for data compression. To
   further reduce the data volume, the operation can be applied more than
   once, to further reduce data volume.

Points to discuss:
The API should be provided at the level of vector or ArrowBuf, or both?
1. If it is based on vector, there can be performance overhead due to
virtual method calls.
2. If it is base on ArrowBuf, some underlying details (type width) are
exposed to the end user, which is not compliant with the principle of
encapsulation.

Reply via email to