ShaiviAgarwal2 commented on issue #39227:
URL: https://github.com/apache/arrow/issues/39227#issuecomment-1884260411
@mapleFU @JacobOgle As far as I can understand, we need to optimize the
decoding of Boolean values in the Parquet C library which we can do by adding a
condition to check if the size of the type `T` is 1 and also to use a
specialized decoding method for it.
This is a possible result that we can try running. This code checks if the
type `T` is a Boolean, if it is, it uses a more efficient method for decoding.
This code should speed up the decoding of Boolean values.
```cpp
if (sizeof(T) == 1) {
const uint8_t* bool_buffer = reinterpret_cast<const uint8_t*>(buffer +
byte_offset);
while (i < batch_size) {
int unpack_size = std::min(8, batch_size - i);
uint8_t unpack_byte = bool_buffer[i / 8];
for (int k = 0; k < unpack_size; ++k) {
v[i + k] = static_cast<T>((unpack_byte >> (7 - (i % 8))) & 1);
}
i += unpack_size;
byte_offset += unpack_size / 8;
}
} else {
// Existing code for other cases
// ...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]