[ 
https://issues.apache.org/jira/browse/FLINK-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993382#comment-16993382
 ] 

Roman Grebennikov commented on FLINK-15171:
-------------------------------------------

[~pnowojski] [~roman_khachatryan] ,while running before-after benchmarks with 
async-profiler, I noticed that the amount of GC produced by serializerTuple is 
really different: the new code has much higher GC (about +10%) pressure due to 
intermediate buffer allocations in StringValue.writeString. All the benchmarks 
I've did were done on a Ryzen 7 2700 with 8 physical cores (16 with HT), but 
the benchmarking Hetzner box with i7 7700 has only 4 (8 with HT).

Before PR: [^dec05.svg]
 After PR: [^dec11.svg]

As GC threads are concurrent, on 8 core box they didn't interfere with the main 
benchmark code (as most probably they were scheduled on the idle cores). But on 
4 core box they started significantly interfere with the benchmark threads, as 
there is much less CPU resources available.

I did a yet another round of improvements in the serialization code to avoid 
additional allocations in StringValue.writeString by adding a static 
ThreadLocal buffer for short strings, the same trick was done for 
StringValue.readString (but it looks a bit ugly, as thread-locals are 
dangerous). But anyway this seems to fix the issue with the too heavy GC 
pressure. I will make a PR today when I will be able to finish all the 
benchmarks to validate the results.

> Performance regression in serialisation benchmarks
> --------------------------------------------------
>
>                 Key: FLINK-15171
>                 URL: https://issues.apache.org/jira/browse/FLINK-15171
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 1.10.0
>            Reporter: Piotr Nowojski
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>             Fix For: 1.10.0
>
>         Attachments: dec05.svg, dec11.svg
>
>
> There is quite significant performance regression in serialisation benchmarks 
> in the commit range 2ecf7ca..9320f34 (which includes FLINK-14346).
> http://codespeed.dak8s.net:8000/timeline/?ben=serializerTuple&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=serializerRow&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=serializerPojo&env=2
> it coincides with the performance improvement for heavy strings
> http://codespeed.dak8s.net:8000/timeline/?ben=serializerHeavyString&env=2
> it might be caused by some accidental change in the benchmarking code 
> (changing parallelism in one benchmarks is carried on to the next one?) or in 
> the code itself.
> CC [~rgrebennikov] [~AHeise]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to