[ 
https://issues.apache.org/jira/browse/FLINK-24459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young closed FLINK-24459.
------------------------------
    Resolution: Fixed

merged: 624372376a09df53e23bb615e5293a6caa296c47

> Performance improvement of file sink on Nexmark
> -----------------------------------------------
>
>                 Key: FLINK-24459
>                 URL: https://issues.apache.org/jira/browse/FLINK-24459
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.14.0
>            Reporter: Alexander Trushev
>            Assignee: Alexander Trushev
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>         Attachments: after.jfr.zip, after_cpu.png, after_mem.png, 
> before.jfr.zip, before_cpu.png, before_mem.png
>
>
> h3. Context
> {{PartitionPathUtils.escapePathName}} is a pretty simple method that takes 
> {{String}}, allocates {{StringBuilder}}, appends original or escaped chars, 
> and outputs the result {{String}}.
> Filesystem sink calls the method several times for each element to determine 
> bucket id. Because of this, it is a hot spot on a workload that writes 
> intensively to filesystem, such as [nexmark 
> q10|https://github.com/nexmark/nexmark/blob/master/nexmark-flink/src/main/resources/queries/q10.sql].
>  On my local machine escaping of chars takes 9.53% CPU and 17.8% mem 
> allocations of the whole TaskManager process.
> h3. Proposal
> {{PartitionPathUtils.escapePathName}} improvements
> # Use more efficient {{Integer.toHexString}} instead of {{String.format}}
> # Do not allocate new string when there is no escapable char in the original 
> string
> # Allocate {{StringBuilder}} depending on the original string length instead 
> of the default value
> h3. Benefit
> Experiment on local machine.
> 1 TaskManager with 6 slots. Job parallelism 6. Nexmark default configuration 
> + object reuse option.
> Before: flink-1.14.0
> After: flink-1.14.0 + patch with the improvements
> || Nexmark q10 || Before || After ||
> | CPU samples of escapePathName() (% of all) | 9.53 | 1.64 |
> | Memory allocations by escapePathName() (% of all) | 17.8 | 2.98 |
> | Throughput/Cores (K/s) | 107.64 | 119.42 |
> Diff: CPU *-7.89*%, Memory *-14.82*%, Throughput *+10.9*%
> Profiling reports are in the attachment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to