[
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856211#comment-17856211
]
Stefan Miklosovic commented on CASSANDRA-18111:
-----------------------------------------------
{code}
@Test
public void testTableMetadataSize()
{
TableSnapshot tableSnapshot = new TableSnapshot("my_keyspace-name",
"my_table_name_a_litt_bit_longer",
UUID.randomUUID(),
UUID.randomUUID().toString(), // tag
Instant.now(),
null,
Set.of(new
File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd"),
new
File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/lklklks /sds/ds/ds/dsd")),
false);
long size = meter.measure(tableSnapshot);
long deepSize = meter.measureDeep(tableSnapshot);
System.out.println(deepSize); // 768
TableSnapshot tableSnapshot2 = new TableSnapshot("my_keyspace-name",
"my_table_name_a_litt_bit_longer",
UUID.randomUUID(),
UUID.randomUUID().toString(), // tag
Instant.now(),
null,
Set.of(new
File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd")),
false);
long deepSize2 = meter.measureDeep(tableSnapshot2);
System.out.println(deepSize2); // 648
// 328
System.out.println(meter.measureDeep(Set.of(new
File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd"))));
// 120
System.out.println(meter.measureDeep(Set.of("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd")));
}
{code}
So with two datadirs it takes around 768 bytes, with 1 data dir 648 bytes.
When we manage to store Set of Strings instead of Set of Files, it will
basically cut the size of that to half so we may say that we save 200 bytes per
one data dir hence it would be around 500 bytes per snapshot entry if we
average all things out.
So if I count it correctly then you would put around 200 000 snapshots into 100
MiB. 20k snapshots 10MiB, 10k snapshots 5MiB. I do not think we need to deal
with this ...
> Cache snapshots in memory
> -------------------------
>
> Key: CASSANDRA-18111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Snapshots
> Reporter: Paulo Motta
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are
> scanned to find snapshots, what is inefficient.
> For example, fetching the
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]