[
https://issues.apache.org/jira/browse/PARQUET-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089347#comment-17089347
]
Micah Kornfield commented on PARQUET-1841:
------------------------------------------
I've been using parquet-arrow-reader-writer-benchmark to assess changes
(BM_ReadColumn<True, *>) should see if there is impact to the overall process.
But those number look nice.
{quote}
I don't find a path to speed just using shuffle/permute API.
{quote}
I think the algorithm would probably have to use lookup tables per nibble/byte
(or one might not be possible).
> [C++] Experiment to see if using SIMD shuffle operations for DecodeSpaced
> improves performance
> ----------------------------------------------------------------------------------------------
>
> Key: PARQUET-1841
> URL: https://issues.apache.org/jira/browse/PARQUET-1841
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Micah Kornfield
> Assignee: Micah Kornfield
> Priority: Major
> Attachments: image-2020-04-14-15-01-48-222.png
>
>
> Followup from PARQUET-1840 for current benchmarks it seems that doing
> removing the memset somehow either has no impact or is slightly worse. We
> should investigate using SIMD operations to speed up spacing.
>
> As part of this we can see if moving the memset to only cover uninitialized
> values after moving all required values provides any speedup.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)