emkornfield commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-638625669
> > If this is AdvanceTillWord, this hurts performance. > > I don't think compiler-specific micro-optimization is a good practice. Generally, the compiler should make a better job if you expose all the code, rather than hide it. OK moved everything to the header. I looked again at the performance numbers inlining seems to improve benchmarks for randomly distributed bits slightly (eyeballing looks like ~20ish percent over existing improvements) but makes the worst case regression 50-100% worse (i.e. in the worse case it is 100% slower then before). > > > Is this mostly InvertRemainingBits? could you provide some more feedback on the algorithm structure you were thinking of? > > It is. Basically, you could have a running word mask (or byte mask) of the remaining bits in the current word (or byte). Removed InvertRemainingBits and generate remaining bit mask on the fly. > > > I would expect word level handling to be much faster for cases this is intended for (when there are actual runs). In practice, once integrated into a system you are likely correct though. > > Yes, I think you're gonna hit diminishing returns when handling more than one byte at a time. Byte level handling would also alleviate the endianness concerns. I would rather not expend the effort to refactor this to byte at a time. It will likely take a performance hit at least on microbenchmarks, and I'd rather spend the effort debugging and fixing the windows builds if that is OK. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
