[
https://issues.apache.org/jira/browse/AVRO-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883625#comment-17883625
]
David Mollitor commented on AVRO-4049:
--------------------------------------
During some micro-benchmarking, I found that there was a significant overhead
to calling the JDK method Arrays#equals. For short strings, the difference in
performance was two orders of magnitude. I expected some overhead, but was
surprised by the final outcome.
However, for longer strings with long (e.g., 16+ characters) common prefixes
the vectorized performance was 50% better.
So, as a compromise, for short strings, use the existing method, for longer
strings, use the Vectorized methods.
> Use JDK Equals in UTF8
> ----------------------
>
> Key: AVRO-4049
> URL: https://issues.apache.org/jira/browse/AVRO-4049
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.13.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Starting in JDK9, the JDK provides a vectorized ways to test if two specified
> arrays of bytes, over the specified ranges, are _equal_ to one another. This
> uses some nice vectorization and in-line methods under the covers.
> [https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Arrays.html#equals(byte%5B%5D,int,int,byte%5B%5D,int,int)]
>
> Use this when determining if two UTF8 Strings are equal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)