[
https://issues.apache.org/jira/browse/AVRO-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933557#comment-17933557
]
ASF subversion and git services commented on AVRO-4067:
-------------------------------------------------------
Commit 11ca5da73cd16aef52b55f0dce814420a7403caa in avro's branch
refs/heads/main from belugabehr
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=11ca5da73 ]
AVRO-4067: Optimize First Byte of Long Decode (#3183)
> Optimize First Byte of Long Decode
> ----------------------------------
>
> Key: AVRO-4067
> URL: https://issues.apache.org/jira/browse/AVRO-4067
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.12.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.13.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Long values are used for many different areas of the spec, and in particular
> a 'zero' value is used often. for example:
>
> {quote}a string is encoded as a long followed by that many bytes of UTF-8
> encoded character data.
> {quote}
> {quote}Arrays are encoded as a series of blocks. Each block consists of a
> long count value, followed by that many array items. A block with count zero
> indicates the end of the array. Each item is encoded per the array’s item
> schema.
> {quote}
> {quote}Maps are encoded as a series of blocks. Each block consists of a long
> count value, followed by that many key/value pairs. A block with count zero
> indicates the end of the map. Each item is encoded per the map’s value schema.
> {quote}
> Because of this, long values actually tend to be pretty small on average, and
> so can often fit within the first byte of the variable-length array.
> Therefore, the first byte should be prioritized.
> For the first byte, if the high-order bit is set, then not only does it mean
> there are more bytes to follow, but that the signed value of the byte will be
> negative. Therefore, the inverse is that for a positive number (>=0), then
> there are not more bytes to follow.
> Check the first byte, and if it is positive, exit early, if it is zero,
> return zero.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)