kgyrtkirk opened a new issue, #16856:
URL: https://github.com/apache/druid/issues/16856

   
   Looking into some perf results it turned out that during queries involving 
windows the significance of restoring a `String` object from the `Frame` 
representation has some performance impact:
   * string data is stored as `utf8` bytes
   * operations like `compare` and such use a `ObjectColumnAccessorBase` to 
access the value
   * since the type is `String` it have to be decoded
   * `StringUtils#fromUtf8` allocates a new bytearray; copies stuff
   * `java.lang.String` constructor is used to decode the bytes from `UTF_8`
   
   
   <details>
   <summary>some benchmark results/tries</summary>
   
   tried forcing the encoding to `ISO-8859-1` which could have some advantages; 
but that will still retain a lot from the above
   ```
   // base
   Benchmark                                        (rowsPerSegment)  (schema)  
(storageType)  Mode  Cnt    Score    Error  Units
   SqlWindowFunctionsBenchmark.windowWithSorter               100000      auto  
         mmap  avgt    5  812.024 ± 24.003  ms/op
   SqlWindowFunctionsBenchmark.windowWithoutSorter            100000      auto  
         mmap  avgt    5  611.424 ± 47.396  ms/op
   
   // ISO-8859-1
   Benchmark                                        (rowsPerSegment)  (schema)  
(storageType)  Mode  Cnt    Score    Error  Units
   SqlWindowFunctionsBenchmark.windowWithSorter               100000      auto  
         mmap  avgt    5  779.933 ± 29.665  ms/op
   SqlWindowFunctionsBenchmark.windowWithoutSorter            100000      auto  
         mmap  avgt    5  613.465 ± 37.255  ms/op
   ```
   </details>
   
   note: For the Partitioner  its not even relevant if the String is lesser or 
smaller - it just needs to know if its the same or not; the same could not be 
said about the Sorter.
   
   note: might be interesting to try and materialize the frame into object 
arrays before processing; so that the conversion only happens once
   
   <details>
   <summary>some flamegraphs about it</summary>
   
   ### the sorter and the partitioner are affected by this
   
![frame-string](https://github.com/user-attachments/assets/b2cef872-00e3-4b51-b3c1-0d85b3064452)
   
   ### zoomed in a bit
   
![frame-string2](https://github.com/user-attachments/assets/683d0bcc-46dc-4601-8af5-2648aed4b59e)
   
   <details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to