Neville Dipale created ARROW-9728:
-------------------------------------
Summary: [Rust] [Parquet] Compute nested spacing
Key: ARROW-9728
URL: https://issues.apache.org/jira/browse/ARROW-9728
Project: Apache Arrow
Issue Type: Sub-task
Components: Rust
Affects Versions: 1.0.0
Reporter: Neville Dipale
When computing definition levels for deeply nested arrays that include lists,
the definition levels are correctly calculated, but they are not translated
into correct indexes for the eventual primitive arrays.
For example, an int32 array could have no null values, but be a child of a list
that has null values. If say the first 5 values of the int32 array are members
of the first list item (i.e. list_array[0] = [1,2,3,4,5], and that list is
itself a child of a struct whose index is null, the whole 5 values of the int32
array *should* be skipped. Further, the list's definition and repetition levels
will be represented by 1 slot instead of the 5.
The current logic cannot cater for this, and potentially results in slicing the
int32 array incorrectly (sometimes including some of those first 5 values).
This Jira is for the work necessary to compute the index into the eventual leaf
arrays correctly.
I started doing it as part of the initial writer PR, but it's complex and is
blocking progress.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)