[ https://issues.apache.org/jira/browse/HBASE-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798818#comment-17798818 ]
Becker Ewing commented on HBASE-28256: -------------------------------------- I want to callout that this is a very similar optimization to what is done in HBASE-14186 just taken to a bit more of an extreme (always read 8 bytes and just using 6 of them instead of doing a 4 byte read and then a 2 byte read). > Enhance ByteBufferUtils.readVLong to read 8 bytes at a time > ----------------------------------------------------------- > > Key: HBASE-28256 > URL: https://issues.apache.org/jira/browse/HBASE-28256 > Project: HBase > Issue Type: Improvement > Components: Performance > Reporter: Becker Ewing > Assignee: Becker Ewing > Priority: Major > Attachments: ReadVLongBenchmark.zip, async-prof-rs-cpu.html > > > Currently, ByteBufferUtils.readVLong is used to decode rows in all data block > encodings in order to read the memstoreTs field. For a data block encoding > like prefix, ByteBufferUtils.readVLong can surprisingly occupy over 50% of > the CPU time in BufferedEncodedSeeker.decodeNext (which can be quite a hot > method in seek operations). > > Since memstoreTs will typically require at least 6 bytes to store, we could > look to vectorize the read path for readVLong to read 8 bytes at a time > instead of a single byte at a time (like in > https://issues.apache.org/jira/browse/HBASE-28025) in order to increase > performance. > > Attached is a CPU flamegraph of a region server process which shows that we > spend a surprising amount of time in decoding rows from the DBE in > ByteBufferUtils.readVLong. -- This message was sent by Atlassian Jira (v8.20.10#820010)