Sumedh Wale created SPARK-21314:
-----------------------------------
Summary: ByteArrayMethods.arrayEquals could use some optimizations
Key: SPARK-21314
URL: https://issues.apache.org/jira/browse/SPARK-21314
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.1.0, 2.0.0
Reporter: Sumedh Wale
Priority: Minor
ByteArrayMethods.arrayEquals is commonly invoked in queries especially for
UTF8String comparisons. It shows up as having a major contribution for many
kinds of queries involving string values like simple filters. An improvement to
the same will help quite a range of queries.
The current implementation:
{code}
int i = 0;
while (i <= length - 8) {
if (Platform.getLong(leftBase, leftOffset + i) !=
Platform.getLong(rightBase, rightOffset + i)) {
return false;
}
i += 8;
}
while (i < length) {
if (Platform.getByte(leftBase, leftOffset + i) !=
Platform.getByte(rightBase, rightOffset + i)) {
return false;
}
i += 1;
}
{code}
can be optimized in two ways:
a) use getInt comparison in remaining when possible which will be much faster
than four byte comparisons
b) offsets can be manipulated individually instead of adding "i" in every loop
Above changes gives numbers like below for 15 byte strings:
{noformat}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Linux 4.4.0-21-generic
Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
compare arrayEquals: Best/Avg Time(ms) Rate(M/s) Per
Row(ns) Relative
------------------------------------------------------------------------------------------------
arrayEquals 1230 / 1255 81.3
12.3 1.0X
arrayEquals2 830 / 846 120.4
8.3 1.5X
{noformat}
The gains vary from 1.2X to 1.6X for different sizes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]