Hi Wes,

Thanks a lot for your quick reply.
I think what you mentioned is almost exactly what we want to do in Java.The
concept is not important.

Maybe there are only some minor differences:
1. In C++, the null_count is mutable, while for Java, once a vector is
constructed as non-nullable, its null count can only be 0.
2. In C++, a non-nullable array's validity buffer is null, while in Java,
the buffer is an empty buffer, and cannot be changed.

Best,
Liya Fan

On Tue, Mar 10, 2020 at 9:26 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi Liya,
>
> In C++ we elect certain faster code paths when the null count is 0 or
> computed to be zero. When the null count is 0, we do not allocate a
> validity bitmap. And there is a "nullable" metadata-only flag at the
> Field level. Could the same kinds of optimizations be implemented in
> Java without introducing a "nullable" concept?
>
> - Wes
>
> On Tue, Mar 10, 2020 at 8:13 AM Fan Liya <liya.fa...@gmail.com> wrote:
> >
> > Dear all,
> >
> > A non-nullable vector is one that is guaranteed to contain no nulls. We
> > want to support non-nullable vectors in Java.
> >
> > *Motivations:*
> > 1. It is widely used in practice. For example, in a database engine, a
> > column can be declared as not null, so it cannot contain null values.
> > 2.Non-nullable vectors has significant performance advantages compared
> with
> > their nullable conterparts, such as:
> >   1) the memory space of the validity buffer can be saved.
> >   2) manipulation of the validity buffer can be bypassed
> >   3) some if-else branches can be replaced by sequential instructions (by
> > the JIT compiler), leading to high throughput for the CPU pipeline.
> >
> > *Potential Cost:*
> > For nullable vectors, there can be extra checks against the nullablility
> > flag. So we must change the code in a way that minimizes the cost.
> >
> > *Proposed Changes:*
> > 1. There is no need to create new vector classes. We add a final boolean
> to
> > the vector base classes as the nullability flag. The value of the flag
> can
> > be obtained from the field when creating the vector.
> > 2. Add a method "boolean isNullable()" to the root interface ValueVector.
> > 3. If a vector is non-nullable, its validity buffer should be an empty
> > buffer (not null, so much of the existing logic can be left unchanged).
> > 4. For operations involving validity buffers (e.g. isNull, get, set), we
> > use the nullability flag to bypass manipulations to the validity buffer.
> >
> > Therefore, it should be possible to support the feature with small code
> > changes.
> >
> > BTW, please note that similar behaviors have already been supported in
> C++.
> >
> > Would you please give your valueable feedback?
> >
> > Best,
> > Liya Fan
>

Reply via email to