Re: [PR] perf: Improved Bit (Un)packing Performance [arrow-nanoarrow]

via GitHub Fri, 06 Oct 2023 05:51:13 -0700


WillAyd commented on PR #280:
URL: https://github.com/apache/arrow-nanoarrow/pull/280#issuecomment-1750620993


   @paleolimbot does this program generate any difference for you? Regardless 
of if compiled with clang or gcc, I get a 2x speedup at -O3 optimization 
without shifts:
   
   ```c
   #include <stdio.h>
   #include <inttypes.h>
   #include <time.h>
   
   #define NANOSEC_PER_SEC 1000000000LL
   
   static inline void UnpackInt8Shifts(const uint8_t word, int8_t* out) {
     out[0] = (word >> 0) & 1;
     out[1] = (word >> 1) & 1;
     out[2] = (word >> 2) & 1;
     out[3] = (word >> 3) & 1;
     out[4] = (word >> 4) & 1;
     out[5] = (word >> 5) & 1;
     out[6] = (word >> 6) & 1;
     out[7] = (word >> 7) & 1;  
   }
   
   static inline void UnpackInt8NoShifts(const uint8_t word, int8_t* out) {
     out[0] = (word & 0x1) != 0;
     out[1] = (word & 0x2) != 0;
     out[2] = (word & 0x4) != 0;
     out[3] = (word & 0x8) != 0;
     out[4] = (word & 0x10) != 0;
     out[5] = (word & 0x20) != 0;
     out[6] = (word & 0x40) != 0;
     out[7] = (word & 0x80) != 0;
   }
   
   
   int main(void) {
     const size_t niters = 100000000;
     const uint8_t word = 0xaa;
     int8_t out[8];
     struct timespec start, end;
     
     clock_gettime(CLOCK_REALTIME, &start);
     for (size_t i = 0; i < niters; i++) {
       UnpackInt8Shifts(word, out);
     }
     clock_gettime(CLOCK_REALTIME, &end);
     printf("ns duration of UnpackInt8Shifts was: %lld\n", (end.tv_sec * 
NANOSEC_PER_SEC + end.tv_nsec) - (start.tv_sec * NANOSEC_PER_SEC + 
start.tv_nsec));
   
     clock_gettime(CLOCK_REALTIME, &start);
     for (size_t i = 0; i < niters; i++) {
       UnpackInt8NoShifts(word, out);
     }
     clock_gettime(CLOCK_REALTIME, &end);
     printf("ns duration of UnpackInt8NoShifts was: %lld\n", (end.tv_sec * 
NANOSEC_PER_SEC + end.tv_nsec) - (start.tv_sec * NANOSEC_PER_SEC + 
start.tv_nsec));
   
     return 0;
   }
   ```
   
   Still worthwhile though I'm not sure how the 2x speedup in the base 
benchmark yet results in a 15x speedup in the Python benchmark on my platform


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] perf: Improved Bit (Un)packing Performance [arrow-nanoarrow]

Reply via email to