#3 is the correct behavior and how the code was meant to be written. I don't see any problems with that pattern. This allows someone to (if they so decide) to null a value without having to rewrite the data. #3 is also a consistent behavior with all other vectors. Null values can use up space but their data is undefined.
I don't agree with your comment on noncontiguous memory. On Wed, Aug 28, 2019, 12:02 AM Fan Liya <liya.fa...@gmail.com> wrote: > Dear all, > > In the discussion of this PR (https://github.com/apache/arrow/pull/5073), > we are faced with a problem: > > Normally, in a VariableWidthVector (e.g. VarCharVector), a null value is > supposed to take no space in the data buffer. In particular, for a null > value, we have > > start index == end index > > Where start index and end index are the start/end positions of the value in > the data buffer. This problem is also related to the ListVector. > > However, it seems that for some scenarios, a null value can take non-empty > space (please see this comment > https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491). > > Since this is an important issue, we should make it clear in the > specification. Otherwise, some unexpected problems may occur in client > code. > > It seems we are faced with 3 options: > > 1. a null value always takes no space. > 2. a null value can take non-empty space, and the content of the non-empty > space is always 0. > 3. a null value can take non-empty space, and the content of the non-empty > space is undefined. > > Option 1 makes the data buffer of a VariableWidthVector a continuous region > (not interleaved by undefined regions). So optimization can be applied. > However, it may lead to memory copy/move (as indicated in the above comment > https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491) > > Option 3 can address the above problem of memory copy/move. However, it > splits memory into un-continuous regions, so optimizations cannot be > performed. In addition, it may cause unexpected problems in client code. > > Option 2 seems like a trade-off between the two. However, it is not > suitable for ListVector. > > Please give your valuable feedback. > > Best, > Liya Fan >