[GitHub] [arrow] emkornfield commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

GitBox Wed, 03 Jun 2020 23:15:26 -0700


emkornfield commented on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-638625669



   > > If this is AdvanceTillWord, this hurts performance.
   > 
   > I don't think compiler-specific micro-optimization is a good practice. 
Generally, the compiler should make a better job if you expose all the code, 
rather than hide it.
   
   OK moved everything to the header.  I looked again at the performance 
numbers inlining seems to improve benchmarks for randomly distributed bits 
slightly (eyeballing looks like ~20ish percent over existing improvements) but 
makes the worst case regression 50-100% worse (i.e. in the worse case it is 
100% slower then before). 
   
   > 
   > > Is this mostly InvertRemainingBits? could you provide some more feedback 
on the algorithm structure you were thinking of?
   > 
   > It is. Basically, you could have a running word mask (or byte mask) of the 
remaining bits in the current word (or byte).
   
   Removed InvertRemainingBits and generate remaining bit mask on the fly.
   
   > 
   > > I would expect word level handling to be much faster for cases this is 
intended for (when there are actual runs). In practice, once integrated into a 
system you are likely correct though.
   > 
   > Yes, I think you're gonna hit diminishing returns when handling more than 
one byte at a time. Byte level handling would also alleviate the endianness 
concerns.
   
   I would rather not expend the effort to refactor this to byte at a time.  It 
will likely take a performance hit at least on microbenchmarks, and I'd rather 
spend the effort debugging and fixing the windows builds if that is OK.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] emkornfield commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

Reply via email to