jbewing opened a new pull request, #5576: URL: https://github.com/apache/hbase/pull/5576
### What This PR updates `ByteBufferUtils.readVLong` to attempt to read 8 bytes from the underlying buffer (if able too) when reading vLongs instead of a single byte at a time to increase performance. ### Implementation Notes Previously, these methods relied on a `ByteVisitor` interface which abstracted the `byte get()` method between a Java ByteBuffer and an HBase ByteBuff. To make these updates, I needed access to a larger variety of methods from both ByteBuf and ByteBuffer. Instead of continuing to use the the `ByteVisitor` abstraction, I split the implementations. They share a bit of common code but have distinct differences now. I believe splitting the call site from `ByteVisitor` to a separate one for `ByteBuffer` and `ByteBuff` enables the JIT compiler to generate slightly better code as the call sites are now each bimorphic. ### Benchmarking I wrote a JMH benchmarking suite (I'll attach the code to the JIRA) that measures the performance of reading various vLongs from both on & off heap buffers with and without padding (padding meaning that the buffer is at least 9 bytes long so that the vectorized read path can be applied). That is to say, for the unoptimized readVLong path, padding shouldn't incur any performance penalty, but with the optimized method only the padded benchmarks should show a substantial performance improvement (and they do show a pretty nice performance improvement). ``` Benchmark (vint) Mode Cnt Score Error Units ReadVLongBenchmark.readVLong_OffHeapBB 9 avgt 5 4.643 ± 2.917 ns/op ReadVLongBenchmark.readVLong_OffHeapBB 512 avgt 5 8.063 ± 0.187 ns/op ReadVLongBenchmark.readVLong_OffHeapBB 2146483640 avgt 5 11.999 ± 0.314 ns/op ReadVLongBenchmark.readVLong_OffHeapBB 1700104028981 avgt 5 14.880 ± 0.698 ns/op ReadVLongBenchmark.readVLong_OffHeapBB_Padded 9 avgt 5 4.233 ± 0.136 ns/op ReadVLongBenchmark.readVLong_OffHeapBB_Padded 512 avgt 5 7.986 ± 0.048 ns/op ReadVLongBenchmark.readVLong_OffHeapBB_Padded 2146483640 avgt 5 12.014 ± 0.012 ns/op ReadVLongBenchmark.readVLong_OffHeapBB_Padded 1700104028981 avgt 5 14.655 ± 2.216 ns/op ReadVLongBenchmark.readVLong_OnHeapBB 9 avgt 5 4.639 ± 0.012 ns/op ReadVLongBenchmark.readVLong_OnHeapBB 512 avgt 5 9.771 ± 0.357 ns/op ReadVLongBenchmark.readVLong_OnHeapBB 2146483640 avgt 5 13.928 ± 0.557 ns/op ReadVLongBenchmark.readVLong_OnHeapBB 1700104028981 avgt 5 17.487 ± 4.527 ns/op ReadVLongBenchmark.readVLong_OnHeapBB_Padded 9 avgt 5 5.245 ± 0.019 ns/op ReadVLongBenchmark.readVLong_OnHeapBB_Padded 512 avgt 5 10.086 ± 0.317 ns/op ReadVLongBenchmark.readVLong_OnHeapBB_Padded 2146483640 avgt 5 13.764 ± 0.100 ns/op ReadVLongBenchmark.readVLong_OnHeapBB_Padded 1700104028981 avgt 5 17.200 ± 0.913 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB 9 avgt 5 4.258 ± 0.012 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB 512 avgt 5 8.621 ± 0.339 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB 2146483640 avgt 5 12.481 ± 2.609 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB 1700104028981 avgt 5 14.211 ± 0.041 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB_Padded 9 avgt 5 4.222 ± 0.007 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB_Padded 512 avgt 5 8.830 ± 0.022 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB_Padded 2146483640 avgt 5 8.998 ± 1.280 ns/op ReadVLongBenchmark.readVLong_Optimized_OffHeapBB_Padded 1700104028981 avgt 5 8.850 ± 0.047 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB 9 avgt 5 4.751 ± 0.732 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB 512 avgt 5 10.575 ± 0.024 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB 2146483640 avgt 5 14.231 ± 0.385 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB 1700104028981 avgt 5 17.252 ± 0.064 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB_Padded 9 avgt 5 4.680 ± 0.108 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB_Padded 512 avgt 5 9.719 ± 1.401 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB_Padded 2146483640 avgt 5 9.511 ± 0.219 ns/op ReadVLongBenchmark.readVLong_Optimized_OnHeapBB_Padded 1700104028981 avgt 5 9.464 ± 0.019 ns/op ``` In terms of how this microbenchmark translates to better seek performance, I'm seeing a consistent 20% performance improvement in the `TestDataBlockEncoders` test performance with this patch vs. without this patch. [HBASE-28256](https://issues.apache.org/jira/browse/HBASE-28256) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org