Neville Dipale created ARROW-9728:
-------------------------------------

             Summary: [Rust] [Parquet] Compute nested spacing
                 Key: ARROW-9728
                 URL: https://issues.apache.org/jira/browse/ARROW-9728
             Project: Apache Arrow
          Issue Type: Sub-task
          Components: Rust
    Affects Versions: 1.0.0
            Reporter: Neville Dipale


When computing definition levels for deeply nested arrays that include lists, 
the definition levels are correctly calculated, but they are not translated 
into correct indexes for the eventual primitive arrays.

For example, an int32 array could have no null values, but be a child of a list 
that has null values. If say the first 5 values of the int32 array are members 
of the first list item (i.e. list_array[0] = [1,2,3,4,5], and that list is 
itself a child of a struct whose index is null, the whole 5 values of the int32 
array *should* be skipped. Further, the list's definition and repetition levels 
will be represented by 1 slot instead of the 5.

The current logic cannot cater for this, and potentially results in slicing the 
int32 array incorrectly (sometimes including some of those first 5 values).

This Jira is for the work necessary to compute the index into the eventual leaf 
arrays correctly.

I started doing it as part of the initial writer PR, but it's complex and is 
blocking progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to