Tie Liu created AVRO-1411:
-----------------------------
Summary: org.apache.avro.util.Utf8 performance improvement by
remove private Charset in class
Key: AVRO-1411
URL: https://issues.apache.org/jira/browse/AVRO-1411
Project: Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.7.5
Reporter: Tie Liu
Priority: Minor
Inside org.apache.avro.util.Utf8 class, it has a private member field defined
as: private static final Charset UTF8 = Charset.forName("UTF-8");
and it's used as:
public static final byte[] getBytesFor(String str) {
return str.getBytes(UTF8);
}
I guess the intention of create this object is to save object creation, but
when we dive into the string.getBytes code, when it's called with Charset, it
actually create a new StringEncoder in java.lang.StringCoding:
static byte[] encode(Charset cs, char[] ca, int off, int len) {
StringEncoder se = new StringEncoder(cs, cs.name());
char[] c = Arrays.copyOf(ca, ca.length);
return se.encode(c, off, len);
}
If instead we just call it with string literal "UTF-8", it will just reuse the
threadlocal StringEncoder.
We tried overwrite this class with passing string literal and proved those
short lived StringEncoder objects is not created any more. Would like apache to
fix this so we don't need to overwrite it anymore.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)