mapleFU commented on code in PR #40335:
URL: https://github.com/apache/arrow/pull/40335#discussion_r1518057437
##########
cpp/src/arrow/util/byte_stream_split_internal.h:
##########
@@ -117,20 +122,22 @@ void ByteStreamSplitEncodeSse2(const uint8_t* raw_values,
const int64_t num_valu
// The current shuffling algorithm diverges for float and double types but
the compiler
// should be able to remove the branch since only one path is taken for each
template
// instantiation.
- // Example run for floats:
+ // Example run for 32-bit variables:
// Step 0, copy:
// 0: ABCD ABCD ABCD ABCD 1: ABCD ABCD ABCD ABCD ...
- // Step 1: _mm_unpacklo_epi8 and mm_unpackhi_epi8:
+ // Step 1: simd_batch<int8_t, 8>::xip_lo and simd_batch<int8_t, 8>::xip_hi:
// 0: AABB CCDD AABB CCDD 1: AABB CCDD AABB CCDD ...
// 0: AAAA BBBB CCCC DDDD 1: AAAA BBBB CCCC DDDD ...
Review Comment:
```
// The shuffling of bytes is performed through the unpack intrinsics.
// In my measurements this gives better performance then an
implementation
// which uses the shuffle intrinsics.
for (int stage_lvl = 0; stage_lvl < 2; ++stage_lvl) {
for (int i = 0; i < kNumStreams / 2; ++i) {
stage[stage_lvl + 1][i * 2] =
xsimd::zip_lo(stage[stage_lvl][i * 2], stage[stage_lvl][i * 2 +
1]);
stage[stage_lvl + 1][i * 2 + 1] =
xsimd::zip_hi(stage[stage_lvl][i * 2], stage[stage_lvl][i * 2 +
1]);
}
}
```
@pitrou it has 2 stage in this step. Which does this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]