belugabehr opened a new pull request, #3279:
URL: https://github.com/apache/avro/pull/3279

   As part of my earlier work for AVRO-4074, I introduced a buffer to store 
strings during serialization. I chose a buffer size of 128 bytes somewhat 
arbitrarily: it is a power of 2. However, upon further reflection, a value of 
127 is a better partition. A string is decomposed into two fields:
   
   > a string is encoded as a long followed by that many bytes of UTF-8 encoded 
character data.
   
   For the binary format of Avro:
   
   > int and long values are written using 
[variable-length](https://lucene.apache.org/java/3_5_0/fileformats.html#VInt) 
[zig-zag](https://code.google.com/apis/protocolbuffers/docs/encoding.html#types)
 coding.
   
   127 bytes is the largest ASCII string that can be written using only a 
single byte for the variable-length size. This makes a more sane boundary for 
the upper limit of this String buffer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@avro.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to