[ 
https://issues.apache.org/jira/browse/AVRO-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933557#comment-17933557
 ] 

ASF subversion and git services commented on AVRO-4067:
-------------------------------------------------------

Commit 11ca5da73cd16aef52b55f0dce814420a7403caa in avro's branch 
refs/heads/main from belugabehr
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=11ca5da73 ]

AVRO-4067: Optimize First Byte of Long Decode (#3183)



> Optimize First Byte of Long Decode
> ----------------------------------
>
>                 Key: AVRO-4067
>                 URL: https://issues.apache.org/jira/browse/AVRO-4067
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.12.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.13.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Long values are used for many different areas of the spec, and in particular 
> a 'zero' value is used often. for example:
>  
> {quote}a string is encoded as a long followed by that many bytes of UTF-8 
> encoded character data.
> {quote}
> {quote}Arrays are encoded as a series of blocks. Each block consists of a 
> long count value, followed by that many array items. A block with count zero 
> indicates the end of the array. Each item is encoded per the array’s item 
> schema.
> {quote}
> {quote}Maps are encoded as a series of blocks. Each block consists of a long 
> count value, followed by that many key/value pairs. A block with count zero 
> indicates the end of the map. Each item is encoded per the map’s value schema.
> {quote}
> Because of this, long values actually tend to be pretty small on average, and 
> so can often fit within the first byte of the variable-length array. 
> Therefore, the first byte should be prioritized.
> For the first byte, if the high-order bit is set, then not only does it mean 
> there are more bytes to follow, but that the signed value of the byte will be 
> negative. Therefore, the inverse is that for a positive number (>=0), then 
> there are not more bytes to follow.
> Check the first byte, and if it is positive, exit early, if it is zero, 
> return zero.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to