Hi Micah,

Thanks a lot for your valuable comments. Please see my comments inline.

> I'm a little concerned that this will change assumptions for at least some
> of the clients using the library (some might always rely on the validity
> buffer being present).

I can understand your concern and I am also concerned.
IMO, the client should not depend on this assumption, as the specification
says "Arrays having a 0 null count may choose to not allocate the validity
bitmap." [1]
That being said, I think it would be safe to provide a global flag to
switch on/off the feature (as you suggested).

> I think this is a good feature to have for the reasons you mentioned. It
> seems like there would need to be some sort of configuration bit to set
for
> this behavior.

Good suggestion. We should be able to switch on and off the feature with a
single global flag.

> But, I'd be worried about code complexity this would
> introduce.

I agree with you that code complexity is an important factor to consider.
IMO, our proposal should not involve too much code change, or increase code
complexity too much.
To prove this, maybe we need to show some small experimental code change.

Best,
Liya Fan

[1] https://arrow.apache.org/docs/format/Columnar.html#logical-types

On Wed, Mar 11, 2020 at 1:53 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Liya Fan,
> I'm a little concerned that this will change assumptions for at least some
> of the clients using the library (some might always rely on the validity
> buffer being present).
>
> I think this is a good feature to have for the reasons you mentioned. It
> seems like there would need to be some sort of configuration bit to set for
> this behavior. But, I'd be worried about code complexity this would
> introduce.
>
> Thanks,
> Micah
>
> On Tue, Mar 10, 2020 at 6:42 AM Fan Liya <liya.fa...@gmail.com> wrote:
>
> > Hi Wes,
> >
> > Thanks a lot for your quick reply.
> > I think what you mentioned is almost exactly what we want to do in
> Java.The
> > concept is not important.
> >
> > Maybe there are only some minor differences:
> > 1. In C++, the null_count is mutable, while for Java, once a vector is
> > constructed as non-nullable, its null count can only be 0.
> > 2. In C++, a non-nullable array's validity buffer is null, while in Java,
> > the buffer is an empty buffer, and cannot be changed.
> >
> > Best,
> > Liya Fan
> >
> > On Tue, Mar 10, 2020 at 9:26 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> > > hi Liya,
> > >
> > > In C++ we elect certain faster code paths when the null count is 0 or
> > > computed to be zero. When the null count is 0, we do not allocate a
> > > validity bitmap. And there is a "nullable" metadata-only flag at the
> > > Field level. Could the same kinds of optimizations be implemented in
> > > Java without introducing a "nullable" concept?
> > >
> > > - Wes
> > >
> > > On Tue, Mar 10, 2020 at 8:13 AM Fan Liya <liya.fa...@gmail.com> wrote:
> > > >
> > > > Dear all,
> > > >
> > > > A non-nullable vector is one that is guaranteed to contain no nulls.
> We
> > > > want to support non-nullable vectors in Java.
> > > >
> > > > *Motivations:*
> > > > 1. It is widely used in practice. For example, in a database engine,
> a
> > > > column can be declared as not null, so it cannot contain null values.
> > > > 2.Non-nullable vectors has significant performance advantages
> compared
> > > with
> > > > their nullable conterparts, such as:
> > > >   1) the memory space of the validity buffer can be saved.
> > > >   2) manipulation of the validity buffer can be bypassed
> > > >   3) some if-else branches can be replaced by sequential instructions
> > (by
> > > > the JIT compiler), leading to high throughput for the CPU pipeline.
> > > >
> > > > *Potential Cost:*
> > > > For nullable vectors, there can be extra checks against the
> > nullablility
> > > > flag. So we must change the code in a way that minimizes the cost.
> > > >
> > > > *Proposed Changes:*
> > > > 1. There is no need to create new vector classes. We add a final
> > boolean
> > > to
> > > > the vector base classes as the nullability flag. The value of the
> flag
> > > can
> > > > be obtained from the field when creating the vector.
> > > > 2. Add a method "boolean isNullable()" to the root interface
> > ValueVector.
> > > > 3. If a vector is non-nullable, its validity buffer should be an
> empty
> > > > buffer (not null, so much of the existing logic can be left
> unchanged).
> > > > 4. For operations involving validity buffers (e.g. isNull, get, set),
> > we
> > > > use the nullability flag to bypass manipulations to the validity
> > buffer.
> > > >
> > > > Therefore, it should be possible to support the feature with small
> code
> > > > changes.
> > > >
> > > > BTW, please note that similar behaviors have already been supported
> in
> > > C++.
> > > >
> > > > Would you please give your valueable feedback?
> > > >
> > > > Best,
> > > > Liya Fan
> > >
> >
>

Reply via email to