[
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830308#comment-17830308
]
Jon Haddad commented on CASSANDRA-19477:
----------------------------------------
Here's some more fun graphs. Both read and write latency and load average, are
significantly improved.
!image-2024-03-24-18-16-50-370.png|width=645,height=205!
!image-2024-03-24-18-20-07-734.png|width=723,height=229!
!image-2024-03-24-18-17-48-334.png|width=653,height=210!
> Do not go to disk to get HintsStore.getTotalFileSize
> ----------------------------------------------------
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Hints
> Reporter: Jon Haddad
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html,
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html,
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png,
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png,
> image-2024-03-24-18-20-07-734.png
>
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed
> significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host
> and version and eliminate 2 of the 3 substitutions, but I think it's probably
> faster to use a StringBuilder and avoid the underlying regular expression
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length. It
> looks like this is called once for each hint file on disk for each host we're
> hinting to. In the case of an overloaded cluster, this is significant. It
> would be better if we were to track the file size in memory for each hint
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]