Tobias Zagorni created ARROW-17956:
--------------------------------------
Summary: [C++] RandomArrayGenerator does not properly generate
ListArrays with Nulls
Key: ARROW-17956
URL: https://issues.apache.org/jira/browse/ARROW-17956
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Tobias Zagorni
Assignee: Tobias Zagorni
There are multiple problems with the {{OffsetsFromLengthsArray}} method:
* There is an assumption that the first and last length value in the input are
never null. This is not true at all for the usage of this method in
GENERATE_LIST_CASE, where the input is completely randomly generated,
respecting null_probability:
[https://github.com/apache/arrow/blob/ed36fcd218d381bd7420f1b762a28c5feea4665f/cpp/src/arrow/testing/random.cc#L730]
* The SetBit call for non-null items is off-by-one. The index variable
represents the index of the next offset, which is based of the current elements
length. But the validity bit should still be set for the current element
* I don't see what effect the {{force_empty_nulls}} argument should have. I
think the desired effect that Null items also have a zero length is always
given, based on how the method is implemented. Please correct me if I'm wrong.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)