Junegunn Choi created HBASE-15569:
-------------------------------------
Summary: Make Bytes.toStringBinary faster
Key: HBASE-15569
URL: https://issues.apache.org/jira/browse/HBASE-15569
Project: HBase
Issue Type: Improvement
Components: Performance
Reporter: Junegunn Choi
Assignee: Junegunn Choi
Priority: Minor
Bytes.toStringBinary is quite expensive due to its use of {{String.format}}. It
seems to me that {{String.format}} is overkill for the purpose and I could
actually make the function up to 45-times faster by replacing the part with a
simpler hand-crafted code.
This is probably a non-issue for HBase server as the function is not used in
performance-sensitive contexts but I figured it wouldn't hurt to make it faster
as it's widely used in builtin tools - Shell, {{HFilePrettyPrinter}} with
{{-p}} option, etc. - and it can be used in clients.
h4. Background:
We have [an HBase monitoring
tool|https://github.com/kakao/hbase-region-inspector] that periodically
collects the information of the regions and it calls {{Bytes.toStringBinary}}
during the process to make some information suitable for display. Profiling
revealed that a large portion of the processing time was spent in
{{String.format}}.
h4. Micro-benchmark:
{code}
byte[] bytes = new byte[256];
for (int i = 0; i < bytes.length; ++i) {
// Mixture of printable and non-printable characters.
// Maximal performance gain (45x) is observed when the array is solely
// composed of non-printable characters.
bytes[i] = (byte) i;
}
long started = System.nanoTime();
for (int i = 0; i < 1000000; ++i) {
Bytes.toStringBinary(bytes);
}
System.out.println(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - started));
{code}
- Without the patch: 134176 ms
- With the patch: 3890 ms
I made sure that the new version returns the same value as before and
simplified the check for non-printable characters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)