Daryn Sharp created HDFS-10662:
----------------------------------
Summary: Optimize UTF8 string/byte conversions
Key: HDFS-10662
URL: https://issues.apache.org/jira/browse/HDFS-10662
Project: Hadoop HDFS
Issue Type: Sub-task
Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp
String/byte conversions may take either a Charset instance or its canonical
name. One might think a Charset instance would be faster due to avoiding a
lookup and instantiation of a Charset, but it's not. The canonical string name
variants will cache the string encoder/decoder (obtained from a Charset)
resulting in better performance.
LOG4J2-935 describes a real-world performance boost. I micro-benched a
marginal runtime improvement on jdk 7/8. However for a 16 byte path, using the
canonical name generated 50% less garbage. For a 64 byte path, 25% of the
garbage. Given the sheer number of times that paths are (re)parsed, the cost
adds up quickly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]