clintropolis opened a new pull request, #18967:
URL: https://github.com/apache/druid/pull/18967
Noticed this while stepping through some stuff in the debugger. The
underlying binding caches arrays at max size to not have to reallocate them
constantly, which causes this thing to coerce a bunch of values that might be
junk for a given vector. A new vector object is created for every batch, so
I've also modified things to cache these type coercion vectors. This stuff is
used by casting used in vector expression processing at least, so is a bit of a
perf improvement depending on how much current size differs from max size.
changes:
* split `ExpressionEvalBindingVector` into
`ExpressionEvalNumericBindingVector` and `ExpressionEvalObjectBindingVector`
* modify `ExpressionEvalNumericBindingVector` and
`ExpressionEvalObjectBindingVector` to use current vector size instead of input
array size when coercing values
* modify `ExpressionEvalNumericBindingVector` and
`ExpressionEvalObjectBindingVector` to use externally managed object array
caches for value coercion instead of recreating each time
perf measurements show some minor improvement
```
SELECT COUNT(*), SUM(CAST(string1 as BIGINT) + CAST(string3 as BIGINT)) FROM
expressions WHERE double3 < 1010.0 AND double3 > 100.0
```
before:
```
Benchmark (complexCompression)
(deferExpressionDimensions) (jsonObjectStorageEncoding) (query)
(rowsPerSegment) (schemaType) (storageType) (stringEncoding) (vectorize)
Mode Cnt Score Error Units
SqlExpressionBenchmark.querySql NONE
singleString SMILE 60 1500000
explicit MMAP UTF8 force avgt 5 33.033 ±
0.682 ms/op
```
after:
```
Benchmark (complexCompression)
(deferExpressionDimensions) (jsonObjectStorageEncoding) (query)
(rowsPerSegment) (schemaType) (storageType) (stringEncoding) (vectorize)
Mode Cnt Score Error Units
SqlExpressionBenchmark.querySql NONE
singleString SMILE 60 1500000
explicit MMAP UTF8 force avgt 5 26.284 ±
0.888 ms/op
```
Also measured a group by just for fun
```
SELECT CAST(string1 as BIGINT) + CAST(string3 as DOUBLE) + long3, COUNT(*)
FROM expressions GROUP BY 1 ORDER BY 2
```
before:
```
Benchmark (complexCompression)
(deferExpressionDimensions) (jsonObjectStorageEncoding) (query)
(rowsPerSegment) (schemaType) (storageType) (stringEncoding) (vectorize)
Mode Cnt Score Error Units
SqlExpressionBenchmark.querySql NONE
singleString SMILE 59 1500000
explicit MMAP UTF8 force avgt 5 285.286 ±
6.212 ms/op
SqlExpressionBenchmark.querySql NONE
fixedWidth SMILE 59 1500000
explicit MMAP UTF8 force avgt 5 443.771 ±
13.592 ms/op
SqlExpressionBenchmark.querySql NONE
fixedWidthNonNumeric SMILE 59 1500000
explicit MMAP UTF8 force avgt 5 455.746 ±
5.016 ms/op
SqlExpressionBenchmark.querySql NONE
always SMILE 59 1500000 explicit
MMAP UTF8 force avgt 5 440.280 ± 8.857 ms/op
```
after:
```
Score Error Units
SqlExpressionBenchmark.querySql NONE
singleString SMILE 59 1500000
explicit MMAP UTF8 force avgt 5 269.656 ±
7.428 ms/op
SqlExpressionBenchmark.querySql NONE
fixedWidth SMILE 59 1500000
explicit MMAP UTF8 force avgt 5 443.834 ±
7.044 ms/op
SqlExpressionBenchmark.querySql NONE
fixedWidthNonNumeric SMILE 59 1500000
explicit MMAP UTF8 force avgt 5 449.640 ±
7.384 ms/op
SqlExpressionBenchmark.querySql NONE
always SMILE 59 1500000 explicit
MMAP UTF8 force avgt 5 450.602 ± 12.424 ms/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]