HippoBaro commented on code in PR #9848:
URL: https://github.com/apache/arrow-rs/pull/9848#discussion_r3383472678
##########
parquet/src/util/bit_util.rs:
##########
@@ -936,6 +936,44 @@ impl From<Vec<u8>> for BitReader {
}
}
+/// Parallel bit extract: for each set bit in `mask`, extract the
+/// corresponding bit from `value` and pack them contiguously into the low
+/// bits of the return value.
+///
+/// Equivalent to the x86 BMI2 `PEXT` instruction. When compiled with the
+/// `bmi2` target feature enabled (for example `-C target-cpu=x86-64-v3`)
+/// this lowers to the hardware `pext` instruction; otherwise it falls back
+/// to a portable scalar loop.
+///
+/// Replace with `value.compress(mask)` when `uint_gather_scatter_bits`
+/// is stabilised: <https://github.com/rust-lang/rust/issues/149069>
+#[inline]
Review Comment:
Here’s a cleaned-up version:
An ideal outcome here, IMO, would be to restructure this code so that we can
use SVE instructions on aarch64. We use almost exclusively ARM instances these
days, and these boolean ops could be sped up quite a bit.
The challenge is that SVE doesn't do use fixed-sized vectors, so the shape
of the code has to change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]