[
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830401#comment-17830401
]
Stefan Miklosovic commented on CASSANDRA-19477:
-----------------------------------------------
[CASSANDRA-19477-trunk|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-trunk]
{noformat}
java17_pre-commit_tests
✓ j17_build 3m 57s
✓ j17_cqlsh_dtests_py311 7m 2s
✓ j17_cqlsh_dtests_py311_vnode 7m 32s
✓ j17_cqlsh_dtests_py38 6m 50s
✓ j17_cqlsh_dtests_py38_vnode 7m 16s
✓ j17_cqlshlib_cython_tests 7m 39s
✓ j17_cqlshlib_tests 6m 31s
✓ j17_dtests 34m 33s
✓ j17_dtests_vnode 35m 10s
✓ j17_jvm_dtests_latest_vnode_repeat 26m 31s
✓ j17_jvm_dtests_repeat 28m 7s
✓ j17_unit_tests 16m 26s
✓ j17_unit_tests_repeat 0m 18s
✓ j17_utests_latest 13m 59s
✓ j17_utests_latest_repeat 0m 13s
✓ j17_utests_oa_repeat 0m 29s
✕ j17_dtests_latest 34m 36s
offline_tools_test.TestOfflineTools test_sstablelevelreset
offline_tools_test.TestOfflineTools test_sstableofflinerelevel
configuration_test.TestConfiguration test_change_durable_writes
configuration_test.TestConfiguration test_change_durable_writes
✕ j17_jvm_dtests 27m 59s
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest
testEndpointVerificationEnabledIpNotInSAN TIMEOUTED
✕ j17_jvm_dtests_latest_vnode 22m 44s
junit.framework.TestSuite
org.apache.cassandra.fuzz.harry.integration.model.InJVMTokenAwareExecutorTest
TIMEOUTED
✕ j17_utests_oa 13m 58s
org.apache.cassandra.db.compaction.CompactionsBytemanTest
testSSTableNotEnoughDiskSpaceForCompactionGetsDropped
java17_separate_tests
java11_pre-commit_tests
✓ j11_build 7m 57s
✓ j11_cqlsh_dtests_py311 7m 7s
✓ j11_cqlsh_dtests_py311_vnode 10m 13s
✓ j11_cqlsh_dtests_py38 8m 1s
✓ j11_cqlsh_dtests_py38_vnode 10m 25s
✓ j11_cqlshlib_cython_tests 7m 28s
✓ j11_cqlshlib_tests 9m 40s
✓ j11_dtests_vnode 36m 58s
✓ j11_jvm_dtests_latest_vnode 25m 28s
✓ j11_jvm_dtests_latest_vnode_repeat 29m 22s
✓ j11_jvm_dtests_repeat 28m 7s
✓ j11_unit_tests 15m 17s
✓ j11_unit_tests_repeat 0m 30s
✓ j11_utests_latest 16m 56s
✓ j11_utests_latest_repeat 0m 34s
✓ j11_utests_oa 13m 58s
✓ j11_utests_oa_repeat 1m 0s
✓ j11_utests_system_keyspace_directory 18m 1s
✓ j11_utests_system_keyspace_directory_repeat 3m 39s
✓ j17_cqlsh_dtests_py311 7m 6s
✓ j17_cqlsh_dtests_py311_vnode 7m 27s
✓ j17_cqlsh_dtests_py38 6m 51s
✓ j17_cqlsh_dtests_py38_vnode 7m 14s
✓ j17_cqlshlib_cython_tests 7m 38s
✓ j17_cqlshlib_tests 6m 57s
✓ j17_dtests 32m 21s
✓ j17_dtests_vnode 34m 24s
✓ j17_jvm_dtests_latest_vnode 22m 45s
✓ j17_jvm_dtests_latest_vnode_repeat 26m 32s
✓ j17_jvm_dtests_repeat 28m 21s
✓ j17_unit_tests_repeat 0m 16s
✓ j17_utests_latest 15m 34s
✓ j17_utests_latest_repeat 0m 36s
✓ j17_utests_oa 13m 43s
✓ j17_utests_oa_repeat 0m 17s
✕ j11_dtests 37m 26s
pushed_notifications_test.TestPushedNotifications
test_move_single_node_localhost
✕ j11_dtests_latest 40m 40s
bootstrap_test.TestBootstrap test_bootstrap_with_reset_bootstrap_state
offline_tools_test.TestOfflineTools test_sstablelevelreset
offline_tools_test.TestOfflineTools test_sstableofflinerelevel
configuration_test.TestConfiguration test_change_durable_writes
✕ j11_jvm_dtests 27m 33s
org.apache.cassandra.fuzz.ring.ConsistentBootstrapTest
coordinatorIsBehindTest
✕ j11_simulator_dtests 10m 37s
org.apache.cassandra.simulator.test.HarrySimulatorTest test
org.apache.cassandra.simulator.test.ShortPaxosSimulationTest
simulationTest
✕ j17_dtests_latest 36m 31s
bootstrap_test.TestBootstrap test_bootstrap_with_reset_bootstrap_state
offline_tools_test.TestOfflineTools test_sstablelevelreset
offline_tools_test.TestOfflineTools test_sstableofflinerelevel
configuration_test.TestConfiguration test_change_durable_writes
✕ j17_jvm_dtests 26m 11s
org.apache.cassandra.fuzz.ring.ConsistentBootstrapTest
coordinatorIsBehindTest
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest
testOptionalMtlsModeDoNotAllowNonSSLConnections
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest
testEndpointVerificationEnabledIpNotInSAN
✕ j17_unit_tests 14m 20s
org.apache.cassandra.db.guardrails.GuardrailMaximumTimestampTest
testEnabledWarn
java11_separate_tests
{noformat}
[java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4067/workflows/467f088d-2bb6-4e61-878b-5931043bc654]
[java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4067/workflows/5f9cf2c6-369d-4257-8958-2288a70c7ed7]
[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4067/workflows/78bfce10-f3d2-433c-bdc0-7841a7e46244]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4067/workflows/66a60733-9c8a-4c58-893f-e699bae36cda]
> Do not go to disk to get HintsStore.getTotalFileSize
> ----------------------------------------------------
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Hints
> Reporter: Jon Haddad
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html,
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html,
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png,
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png,
> image-2024-03-24-18-20-07-734.png
>
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed
> significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host
> and version and eliminate 2 of the 3 substitutions, but I think it's probably
> faster to use a StringBuilder and avoid the underlying regular expression
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length. It
> looks like this is called once for each hint file on disk for each host we're
> hinting to. In the case of an overloaded cluster, this is significant. It
> would be better if we were to track the file size in memory for each hint
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]