[
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589470#comment-14589470
]
Benedict commented on CASSANDRA-9499:
-------------------------------------
I'm confused as to why we need 10 bytes? Pretty much by definition a
continuation bit encoding needs 9 bytes to represent 8 bytes. Let's not use
Google's implementation. It looks pretty rubbish.
The reason they use 10 bytes is they cannot be bothered to realise the last
byte does not need a continuation bit. They also use a *terrible*
implementation for deciding how long it needs to be.
Here's a simple change which makes it 9 bytes, and easily optimised: the
continuation bits are all shifted to the first byte, which effectively encodes
the length in run-length encoding, i.e. the number of contiguous top order bits
that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte ^
(byte) -1)}}
This way we read the first byte, and if there are any more to read, we read a
long, and quickly truncate.
> Introduce writeVInt method to DataOutputStreamPlus
> --------------------------------------------------
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Ariel Weisberg
> Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It
> should be possible to make an especially efficient implementation against
> BufferedDataOutputStreamPlus.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)