belugabehr opened a new pull request, #3279: URL: https://github.com/apache/avro/pull/3279
As part of my earlier work for AVRO-4074, I introduced a buffer to store strings during serialization. I chose a buffer size of 128 bytes somewhat arbitrarily: it is a power of 2. However, upon further reflection, a value of 127 is a better partition. A string is decomposed into two fields: > a string is encoded as a long followed by that many bytes of UTF-8 encoded character data. For the binary format of Avro: > int and long values are written using [variable-length](https://lucene.apache.org/java/3_5_0/fileformats.html#VInt) [zig-zag](https://code.google.com/apis/protocolbuffers/docs/encoding.html#types) coding. 127 bytes is the largest ASCII string that can be written using only a single byte for the variable-length size. This makes a more sane boundary for the upper limit of this String buffer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org