[jira] [Commented] (CASSANDRA-18111) Cache snapshots in memory

Paulo Motta (Jira) Tue, 18 Jun 2024 12:06:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856039#comment-17856039
 ]


Paulo Motta commented on CASSANDRA-18111:
-----------------------------------------

{quote}Is there a way to disable this functionality? I briefly took a look at 
the implementation but didn't see anything that would allow you to disable it. 
My concern is that we've seen issues with the amount of snapshots on large 
clusters, so this can be problematic for some clusters by putting additional 
memory pressure on individual hosts.
{quote}
I would like to understand what kind of problems did you encounter so we could 
try to address them here if possible. The goal of this ticket is exactly to 
optimize for large number of snapshots by avoiding an expensive directory 
traversal when snapshots are listed, so I think it would be counterproductive 
to disable this. See CASSANDRA-13338 which is the original motivation for this 
ticket.

We have not considered memory cost for keeping these snapshot metadata in 
memory, but perhaps this is something to consider for large amounts of 
snapshots. Do you have a ballpark number for a very large of amount of 
snapshots per node in your experience ? 10K, 100K, 1M?

> Cache snapshots in memory
> -------------------------
>
>                 Key: CASSANDRA-18111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Snapshots
>            Reporter: Paulo Motta
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are 
> scanned to find snapshots, what is inefficient.
> For example, fetching the 
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric 
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via 
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18111) Cache snapshots in memory

Reply via email to