Dear all, In the discussion of this PR (https://github.com/apache/arrow/pull/5073), we are faced with a problem:
Normally, in a VariableWidthVector (e.g. VarCharVector), a null value is supposed to take no space in the data buffer. In particular, for a null value, we have start index == end index Where start index and end index are the start/end positions of the value in the data buffer. This problem is also related to the ListVector. However, it seems that for some scenarios, a null value can take non-empty space (please see this comment https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491). Since this is an important issue, we should make it clear in the specification. Otherwise, some unexpected problems may occur in client code. It seems we are faced with 3 options: 1. a null value always takes no space. 2. a null value can take non-empty space, and the content of the non-empty space is always 0. 3. a null value can take non-empty space, and the content of the non-empty space is undefined. Option 1 makes the data buffer of a VariableWidthVector a continuous region (not interleaved by undefined regions). So optimization can be applied. However, it may lead to memory copy/move (as indicated in the above comment https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491) Option 3 can address the above problem of memory copy/move. However, it splits memory into un-continuous regions, so optimizations cannot be performed. In addition, it may cause unexpected problems in client code. Option 2 seems like a trade-off between the two. However, it is not suitable for ListVector. Please give your valuable feedback. Best, Liya Fan