jbewing commented on PR #5576: URL: https://github.com/apache/hbase/pull/5576#issuecomment-1854065824
> In general, if we choose to use VLong, it means usually the value will be small, for example, only 1 or 2 bytes, so I wonder whether your JMH covers the most common scenario? Benchmark does cover a variety of vLong sizes. The microbenchmarks above have breakdowns for the following vLongs: 9, 512, 2146483640, and 1700104028981 which can be encoded as 1 byte, 3 byte, 5 byte, and 7 bytes respectively (so a wide range). In practice, ByteBufferUtils.readVLong is only used in the HBase codebase to decode the memstoreTs for the CopyKey, Diff, Fast Diff, Prefix, and RIV1 DBE encodings so performance when decoding a 7 byte vLong (an actual epoch millis timestamp is 6 bytes, but there is a 1 byte overhead for the vLong encoding) is the most important gauge of how this will affect performance. I agree that generally when vLong is used, the caller believes that a fair amount of the data will be smaller than 7 or 8 bytes so the vLong encoding saves space. However, in practice the ByteBufferUtils.readVLong method isn't used in that way. > For random long value which is encoded as VLong, I think read 8 bytes at once will be faster, but what if the values are often only 1 or 2 bytes? From what the benchmarks show, there is absolutely no regression at the 1 byte case as the code is identical up to that point. When a 2 byte vLong is read, the benchmarks show a small performance penalty. It seems like the cost of reading an 8 byte word on a modern machine is about the same cost as reading a single byte word, so performance hit isn't super drastic (at least as benchmarked on my machine). Again, because of how this particular method is used to decode vLongs in the HBase codebase, we should be focused on the performance for the 1700104028981 vLong decoding case as it most closely represents the performance for decoding a timestamp. In any case, if we were to start using vLongs more widely, the performance for the 2 byte case for the optimized method isn't really that much worse than the current behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org