[
https://issues.apache.org/jira/browse/ARROW-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185332#comment-17185332
]
Antoine Pitrou commented on ARROW-8494:
---------------------------------------
If I understand correctly, for a non-list nullable field, we only need to
update the null bitmap:
* if def level >= field's def level, append non-null
* otherwise, append null
For a non-nullable list field, we must update the offsets:
* if rep level < field's rep level and def level < field's def level , append
current_offset (empty list)
* if rep level < field's rep level and def level >= field's def level , append
current_offset++ (first item in new list)
* otherwise, just current_offset++ (next item in same list)
For a nullable list field, the ancestor_def_level must also be taken into
account?
So non-list fields are easy, list fields have more sophisticated logic that
might be less easy to do efficiently.
> [C++] Implement vectorized array reassembly logic
> -------------------------------------------------
>
> Key: ARROW-8494
> URL: https://issues.apache.org/jira/browse/ARROW-8494
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++
> Reporter: Micah Kornfield
> Assignee: Micah Kornfield
> Priority: Major
>
> This logic would attempt to create the data necessary for each field by
> passing through the levels once for each field. it is expected that due to
> SIMD this will perform better for nested data with shallow nesting, but due
> to repetitive computation might perform worse for deep nested that include
> List-types.
>
> At a high level the logic would be structured as:
> {{for each field:}}
> {{ for each rep/def level entry:}}
> {{ update null bitmask and offsets.}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)