[ https://issues.apache.org/jira/browse/HBASE-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799546#comment-17799546 ]
Becker Ewing commented on HBASE-28256: -------------------------------------- At the higher-end (6 byte long encoded as vLong and up), it seems like readVLongTimestamp has better performance than readVLongHBase14186 when padding exists (which I'm pretty sure will be the common case) and when we're not using the none recycler (i.e. what perf will be like in Region Servers until HBASE-27730). Since I plan on getting to https://issues.apache.org/jira/browse/HBASE-27730 soon, this likely isn't a huge issue as the performance of readVLongHBase14186 generally looks way better on the none recycler. > Enhance ByteBufferUtils.readVLong to read 8 bytes at a time > ----------------------------------------------------------- > > Key: HBASE-28256 > URL: https://issues.apache.org/jira/browse/HBASE-28256 > Project: HBase > Issue Type: Improvement > Components: Performance > Reporter: Becker Ewing > Assignee: Becker Ewing > Priority: Major > Attachments: ReadVLongBenchmark.zip, async-prof-rs-cpu.html > > > Currently, ByteBufferUtils.readVLong is used to decode rows in all data block > encodings in order to read the memstoreTs field. For a data block encoding > like prefix, ByteBufferUtils.readVLong can surprisingly occupy over 50% of > the CPU time in BufferedEncodedSeeker.decodeNext (which can be quite a hot > method in seek operations). > > Since memstoreTs will typically require at least 6 bytes to store, we could > look to vectorize the read path for readVLong to read 8 bytes at a time > instead of a single byte at a time (like in > https://issues.apache.org/jira/browse/HBASE-28025) in order to increase > performance. > > Attached is a CPU flamegraph of a region server process which shows that we > spend a surprising amount of time in decoding rows from the DBE in > ByteBufferUtils.readVLong. -- This message was sent by Atlassian Jira (v8.20.10#820010)