[jira] [Commented] (PARQUET-1841) [C++] Experiment to see if using SIMD shuffle operations for DecodeSpaced improves performance

Frank Du (Jira) Sun, 26 Apr 2020 20:24:48 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092967#comment-17092967
 ]


Frank Du commented on PARQUET-1841:
-----------------------------------

I has one draft SSE decode version with a short table for uint32/uint64 path. 
It get nearly double score for epi32 using the new created spaced benchmark, 
and 50% improvements(make sense to me as one m128i block can only host two 
epi64 items) for epi64. For the uint8 path, the table is really large(2^16 
m128i entry) thus I think we shouldn't speed it. I can work with the encode 
path with similar approach if the way is good direction.

 

Before:

BM_PlainDecodingSpacedFloat/1024 890 ns 889 ns 786208 
bytes_per_second=4.29213G/s

BM_PlainDecodingSpacedDouble/1024 928 ns 927 ns 754657 
bytes_per_second=8.22773G/s

 

After:

BM_PlainDecodingSpacedFloat/1024 456 ns 455 ns 1536349 
bytes_per_second=8.37965G/s

BM_PlainDecodingSpacedDouble/1024 647 ns 646 ns 1082324 
bytes_per_second=11.8107G/s

> [C++] Experiment to see if using SIMD shuffle operations for DecodeSpaced 
> improves performance
> ----------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1841
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1841
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Micah Kornfield
>            Assignee: Micah Kornfield
>            Priority: Major
>         Attachments: image-2020-04-14-15-01-48-222.png
>
>
> Followup from PARQUET-1840 for current benchmarks it seems that doing 
> removing the memset somehow either has no impact or is slightly worse.  We 
> should investigate using SIMD operations to speed up spacing. 
>  
> As part of this we can see if moving the memset to only cover uninitialized 
> values after moving all required values provides any speedup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1841) [C++] Experiment to see if using SIMD shuffle operations for DecodeSpaced improves performance

Reply via email to