#3 is the correct behavior and how the code was meant to be written. I
don't see any problems with that pattern. This allows someone to (if they
so decide) to null a value without having to rewrite the data. #3 is also a
consistent
behavior with all other vectors. Null values can use up space but their
data is undefined.

I don't agree with your comment on noncontiguous memory.


On Wed, Aug 28, 2019, 12:02 AM Fan Liya <liya.fa...@gmail.com> wrote:

> Dear all,
>
> In the discussion of this PR (https://github.com/apache/arrow/pull/5073),
> we are faced with a problem:
>
> Normally, in a VariableWidthVector (e.g. VarCharVector), a null value is
> supposed to take no space in the data buffer. In particular, for a null
> value, we have
>
> start index == end index
>
> Where start index and end index are the start/end positions of the value in
> the data buffer. This problem is also related to the ListVector.
>
> However, it seems that for some scenarios, a null value can take non-empty
> space (please see this comment
> https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491).
>
> Since this is an important issue, we should make it clear in the
> specification. Otherwise, some unexpected problems may occur in client
> code.
>
> It seems we are faced with 3 options:
>
> 1. a null value always takes no space.
> 2. a null value can take non-empty space, and the content of the non-empty
> space is always 0.
> 3. a null value can take non-empty space, and the content of the non-empty
> space is undefined.
>
> Option 1 makes the data buffer of a VariableWidthVector a continuous region
> (not interleaved by undefined regions). So optimization can be applied.
> However, it may lead to memory copy/move (as indicated in the above comment
> https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491)
>
> Option 3 can address the above problem of memory copy/move. However, it
> splits memory into un-continuous regions, so optimizations cannot be
> performed. In addition, it may cause unexpected problems in client code.
>
> Option 2 seems like a trade-off between the two. However, it is not
> suitable for ListVector.
>
> Please give your valuable feedback.
>
> Best,
> Liya Fan
>

Reply via email to