Hi Wes, Thanks a lot for your quick reply. I think what you mentioned is almost exactly what we want to do in Java.The concept is not important.
Maybe there are only some minor differences: 1. In C++, the null_count is mutable, while for Java, once a vector is constructed as non-nullable, its null count can only be 0. 2. In C++, a non-nullable array's validity buffer is null, while in Java, the buffer is an empty buffer, and cannot be changed. Best, Liya Fan On Tue, Mar 10, 2020 at 9:26 PM Wes McKinney <wesmck...@gmail.com> wrote: > hi Liya, > > In C++ we elect certain faster code paths when the null count is 0 or > computed to be zero. When the null count is 0, we do not allocate a > validity bitmap. And there is a "nullable" metadata-only flag at the > Field level. Could the same kinds of optimizations be implemented in > Java without introducing a "nullable" concept? > > - Wes > > On Tue, Mar 10, 2020 at 8:13 AM Fan Liya <liya.fa...@gmail.com> wrote: > > > > Dear all, > > > > A non-nullable vector is one that is guaranteed to contain no nulls. We > > want to support non-nullable vectors in Java. > > > > *Motivations:* > > 1. It is widely used in practice. For example, in a database engine, a > > column can be declared as not null, so it cannot contain null values. > > 2.Non-nullable vectors has significant performance advantages compared > with > > their nullable conterparts, such as: > > 1) the memory space of the validity buffer can be saved. > > 2) manipulation of the validity buffer can be bypassed > > 3) some if-else branches can be replaced by sequential instructions (by > > the JIT compiler), leading to high throughput for the CPU pipeline. > > > > *Potential Cost:* > > For nullable vectors, there can be extra checks against the nullablility > > flag. So we must change the code in a way that minimizes the cost. > > > > *Proposed Changes:* > > 1. There is no need to create new vector classes. We add a final boolean > to > > the vector base classes as the nullability flag. The value of the flag > can > > be obtained from the field when creating the vector. > > 2. Add a method "boolean isNullable()" to the root interface ValueVector. > > 3. If a vector is non-nullable, its validity buffer should be an empty > > buffer (not null, so much of the existing logic can be left unchanged). > > 4. For operations involving validity buffers (e.g. isNull, get, set), we > > use the nullability flag to bypass manipulations to the validity buffer. > > > > Therefore, it should be possible to support the feature with small code > > changes. > > > > BTW, please note that similar behaviors have already been supported in > C++. > > > > Would you please give your valueable feedback? > > > > Best, > > Liya Fan >