kgyrtkirk opened a new issue, #16856:
URL: https://github.com/apache/druid/issues/16856
Looking into some perf results it turned out that during queries involving
windows the significance of restoring a `String` object from the `Frame`
representation has some performance impact:
* string data is stored as `utf8` bytes
* operations like `compare` and such use a `ObjectColumnAccessorBase` to
access the value
* since the type is `String` it have to be decoded
* `StringUtils#fromUtf8` allocates a new bytearray; copies stuff
* `java.lang.String` constructor is used to decode the bytes from `UTF_8`
<details>
<summary>some benchmark results/tries</summary>
tried forcing the encoding to `ISO-8859-1` which could have some advantages;
but that will still retain a lot from the above
```
// base
Benchmark (rowsPerSegment) (schema)
(storageType) Mode Cnt Score Error Units
SqlWindowFunctionsBenchmark.windowWithSorter 100000 auto
mmap avgt 5 812.024 ± 24.003 ms/op
SqlWindowFunctionsBenchmark.windowWithoutSorter 100000 auto
mmap avgt 5 611.424 ± 47.396 ms/op
// ISO-8859-1
Benchmark (rowsPerSegment) (schema)
(storageType) Mode Cnt Score Error Units
SqlWindowFunctionsBenchmark.windowWithSorter 100000 auto
mmap avgt 5 779.933 ± 29.665 ms/op
SqlWindowFunctionsBenchmark.windowWithoutSorter 100000 auto
mmap avgt 5 613.465 ± 37.255 ms/op
```
</details>
note: For the Partitioner its not even relevant if the String is lesser or
smaller - it just needs to know if its the same or not; the same could not be
said about the Sorter.
note: might be interesting to try and materialize the frame into object
arrays before processing; so that the conversion only happens once
<details>
<summary>some flamegraphs about it</summary>
### the sorter and the partitioner are affected by this

### zoomed in a bit

<details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]