vertexclique commented on a change in pull request #7061:
URL: https://github.com/apache/arrow/pull/7061#discussion_r418999354
##########
File path: rust/arrow/src/util/bit_util.rs
##########
@@ -148,11 +148,17 @@ pub fn count_set_bits_offset(data: &[u8], offset: usize,
length: usize) -> usize
/// Returns the ceil of `value`/`divisor`
#[inline]
pub fn ceil(value: usize, divisor: usize) -> usize {
- let mut result = value / divisor;
- if value % divisor != 0 {
- result += 1
- };
- result
+ if value == 0_usize {
Review comment:
Oh it is, meanwhile looking for zero sized allocations I came across
with this, from this chunk of code:
```
impl BufferBuilderTrait<BooleanType> for BufferBuilder<BooleanType> {
fn new(capacity: usize) -> Self {
let byte_capacity = bit_util::ceil(capacity, 8);
let actual_capacity =
bit_util::round_upto_multiple_of_64(byte_capacity);
let mut buffer = MutableBuffer::new(actual_capacity);
buffer.set_null_bits(0, actual_capacity);
Self {
buffer,
len: 0,
_marker: PhantomData,
}
}
```
BufferBuilderTrait is using this code for every reallocation. Ceil is not
euclidean C- division. According to this paper:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/divmodnote-letter.pdf
So I thought better to use established C-division in this case, where things
got improved from that side too.
Separate PR should be open to fixing this in Parquet too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]