[
https://issues.apache.org/jira/browse/ARROW-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185320#comment-17185320
]
Antoine Pitrou commented on ARROW-8495:
---------------------------------------
Are you expecting to switch between both approaches (vectorized /
non-vectorized) depending on heuristics?
It seems to me that in most/all cases, the vectorized approach should be
faster, perhaps by operating on limited-size chunks, such that we make better
use of the CPU cache:
{code:java}
for each cache-sized chunk (e.g. 1K levels):
for each field:
for each rep/dep level entry in chunk:
update null bitmask and offsets
{code}
Also, I assume this is for a single Parquet leaf node, right?
> [C++] Implement non-vectorized array reconstruction logic.
> ----------------------------------------------------------
>
> Key: ARROW-8495
> URL: https://issues.apache.org/jira/browse/ARROW-8495
> Project: Apache Arrow
> Issue Type: Sub-task
> Reporter: Micah Kornfield
> Priority: Major
>
> In contrast to the "Vectorized" reassembly this would scan:
>
> {{for each rep/def level entry:}}
> {{ for each field:}}
> {{ update null bitmask and offsets.}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)