[
https://issues.apache.org/jira/browse/SPARK-21314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-21314:
------------------------------
Issue Type: Improvement (was: Bug)
(not a bug)
I would be surprised if it made much difference, because you can save at most 6
calls per string, but if lots of strings are short maybe that's a lot. Your
benchmark is for basically the best-case-scenario. I wonder if you have any
more realistic one?
I'd also be surprised if saving one addition matters much but it could add up.
> ByteArrayMethods.arrayEquals could use some optimizations
> ---------------------------------------------------------
>
> Key: SPARK-21314
> URL: https://issues.apache.org/jira/browse/SPARK-21314
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.0.0, 2.1.0
> Reporter: Sumedh Wale
> Priority: Minor
> Labels: performance
>
> ByteArrayMethods.arrayEquals is commonly invoked in queries especially for
> UTF8String comparisons. It shows up as having a major contribution for many
> kinds of queries involving string values like simple filters. An improvement
> to the same will help quite a range of queries.
> The current implementation:
> {code}
> int i = 0;
> while (i <= length - 8) {
> if (Platform.getLong(leftBase, leftOffset + i) !=
> Platform.getLong(rightBase, rightOffset + i)) {
> return false;
> }
> i += 8;
> }
> while (i < length) {
> if (Platform.getByte(leftBase, leftOffset + i) !=
> Platform.getByte(rightBase, rightOffset + i)) {
> return false;
> }
> i += 1;
> }
> {code}
> can be optimized in two ways:
> a) use getInt comparison in remaining when possible which will be much faster
> than four byte comparisons
> b) offsets can be manipulated individually instead of adding "i" in every loop
> Above changes gives numbers like below for 15 byte strings:
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Linux 4.4.0-21-generic
> Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
> compare arrayEquals: Best/Avg Time(ms) Rate(M/s) Per
> Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> arrayEquals 1230 / 1255 81.3
> 12.3 1.0X
> arrayEquals2 830 / 846 120.4
> 8.3 1.5X
> {noformat}
> The gains vary from 1.2X to 1.6X for different sizes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]