Utf8 allocates new byte array unnessisarily
-------------------------------------------
Key: AVRO-1041
URL: https://issues.apache.org/jira/browse/AVRO-1041
Project: Avro
Issue Type: Bug
Components: java
Affects Versions: 1.6.2
Reporter: dave irving
Priority: Minor
When a {{Utf8}} instance is about to receive new data (i.e. in
{{BinaryDecoder}}), {{Utf8::setByteLength}} is invoked to essentially ensure
capacity of the backing byte array.
However, the logical length of the current instance is compared against the
required size rather than the existing byte array size.
This causes needless allocations of a new backing byte array: If you read a 10
byte string followed by an 8 byte string followed by a 9 byte string, the 3rd
read will cause a new backing array allocation even though the instance already
has a 10 byte array at its disposal.
At a minimum we should replace:
{code}
public Utf8 setByteLength(int newLength) {
if (this.length < newLength) {
byte[] newBytes = new byte[newLength];
System.arraycopy(bytes, 0, newBytes, 0, this.length);
this.bytes = newBytes;
}
...
}
{code}
with:
{code}
public Utf8 setByteLength(int newLength) {
if (this.bytes.length < newLength) {
byte[] newBytes = new byte[newLength];
System.arraycopy(bytes, 0, newBytes, 0, this.length);
this.bytes = newBytes;
}
...
}
{code}
We may also wish to consider setting a maximum size limit to the utf8 instance:
If we allocate over this, we drop the backing array the next time we get a
resize for a data length smaller than this (so we aren't forced to keep memory
for the largest utf8 encountered in memory).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira