HadrienG2 commented on code in PR #5760:
URL: https://github.com/apache/arrow-rs/pull/5760#discussion_r1599381726
##########
arrow-array/src/builder/boolean_builder.rs:
##########
@@ -166,6 +166,14 @@ impl BooleanBuilder {
BooleanArray::from(array_data)
}
+ /// Returns the current values buffer as a slice
+ ///
+ /// Boolean values are bit-packed into bytes, so to read the i-th boolean,
+ /// you must compute `values_slice[i / 8] & (1 << (i % 8)) == 0`.
Review Comment:
Will do so, but out of curiosity, did you benchmark that the current
implementation is actually faster? A [quick ASM
comparison](https://godbolt.org/z/xbrccY18M) makes me a little surprised that
it is the case.
- Indexing a bitmap is a common operation that optimizing compilers know how
deal with pretty well, trivially turning `x / 8` into `x >> 3`, `x % 8` into `x
& 7` and `int & (1 << N)` where N can be proven at compile time to be in bounds
into a bit test. So effectively we're comparing a `bt/setb` pair into to a
table lookup + `test/setne` pair here, and from <https://uops.info/table.html>,
it does not look like `bt/setb` is particularly badly implemented on current
CPUs.
- Furthermore, since the table is specified to be a const, and thus does not
have a dedicated memory location in global program data, extra instructions had
to be generated to push the table to the stack on every program call, adding a
memory store (which worries me because these have an unfortunate tendency to
create superscalar memory hazards that worsen the latency of other memory
instructions surrounding them) on the hot code path.
* The store can be avoided (or more accurately turned into a load, which
will reliably hit L1 if the function is called in the kind of tight loops where
its performance matters) by turning a const into a static, as shown in the
linked snippet.
Against, I'm just static-analyzing the assembly by eye here, did not take
the time to set up a proper microbenchmark. Just surprised that it makes a
positive difference.
##########
arrow-array/src/builder/boolean_builder.rs:
##########
@@ -166,6 +166,14 @@ impl BooleanBuilder {
BooleanArray::from(array_data)
}
+ /// Returns the current values buffer as a slice
+ ///
+ /// Boolean values are bit-packed into bytes, so to read the i-th boolean,
+ /// you must compute `values_slice[i / 8] & (1 << (i % 8)) == 0`.
Review Comment:
Will do so, but out of curiosity, did you benchmark that the current
implementation is actually faster? A [quick ASM
comparison](https://godbolt.org/z/xbrccY18M) makes me a little surprised that
it is the case.
- Indexing a bitmap is a common operation that optimizing compilers know how
deal with pretty well, trivially turning `x / 8` into `x >> 3`, `x % 8` into `x
& 7` and `int & (1 << N)` where N can be proven at compile time to be in bounds
into a bit test. So effectively we're comparing a `bt/setb` pair into to a
table lookup + `test/setne` pair here, and from <https://uops.info/table.html>,
it does not look like `bt/setb` is particularly badly implemented on current
CPUs.
- Furthermore, since the table is specified to be a const, and thus does not
have a dedicated memory location in global program data, extra instructions had
to be generated to push the table to the stack on every program call, adding a
memory store (which worries me because these have an unfortunate tendency to
create superscalar memory hazards that worsen the latency of loads surrounding
them) on the hot code path.
* The store can be avoided (or more accurately turned into a load, which
will reliably hit L1 if the function is called in the kind of tight loops where
its performance matters) by turning a const into a static, as shown in the
linked snippet.
Against, I'm just static-analyzing the assembly by eye here, did not take
the time to set up a proper microbenchmark. Just surprised that it makes a
positive difference.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]