ShaiviAgarwal2 commented on issue #39227: URL: https://github.com/apache/arrow/issues/39227#issuecomment-1884260411
@mapleFU @JacobOgle As far as I can understand, we need to optimize the decoding of Boolean values in the Parquet C library which we can do by adding a condition to check if the size of the type `T` is 1 and also to use a specialized decoding method for it. This is a possible result that we can try running. This code checks if the type `T` is a Boolean, if it is, it uses a more efficient method for decoding. This code should speed up the decoding of Boolean values. ```cpp if (sizeof(T) == 1) { const uint8_t* bool_buffer = reinterpret_cast<const uint8_t*>(buffer + byte_offset); while (i < batch_size) { int unpack_size = std::min(8, batch_size - i); uint8_t unpack_byte = bool_buffer[i / 8]; for (int k = 0; k < unpack_size; ++k) { v[i + k] = static_cast<T>((unpack_byte >> (7 - (i % 8))) & 1); } i += unpack_size; byte_offset += unpack_size / 8; } } else { // Existing code for other cases // ... } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org