ShaiviAgarwal2 commented on issue #39227:
URL: https://github.com/apache/arrow/issues/39227#issuecomment-1884260411

   @mapleFU @JacobOgle As far as I can understand, we need to optimize the 
decoding of Boolean values in the Parquet C library which we can do by adding a 
condition to check if the size of the type `T` is 1 and also to use a 
specialized decoding method for it.
   
   This is a possible result that we can try running. This code checks if the 
type `T` is a Boolean, if it is, it uses a more efficient method for decoding. 
This code should speed up the decoding of Boolean values.
   
   ```cpp
   if (sizeof(T) == 1) {
       const uint8_t* bool_buffer = reinterpret_cast<const uint8_t*>(buffer + 
byte_offset);
       while (i < batch_size) {
           int unpack_size = std::min(8, batch_size - i);
           uint8_t unpack_byte = bool_buffer[i / 8];
           for (int k = 0; k < unpack_size; ++k) {
               v[i + k] = static_cast<T>((unpack_byte >> (7 - (i % 8))) & 1);
           }
           i += unpack_size;
           byte_offset += unpack_size / 8;
       }
   } else {
       // Existing code for other cases
       // ...
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to