[
https://issues.apache.org/jira/browse/CASSANDRA-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcus Eriksson updated CASSANDRA-15364:
----------------------------------------
Test and Documentation Plan: circle ci run
Status: Patch Available (was: Open)
[patch|https://github.com/krummas/cassandra/commits/marcuse/15364] - this lists
the files only once and improves performance of getExistingFiles by using a
TreeSet containing the file name prefixes. In my silly laptop local benchmark
(verifying a LogFile with 2000 remove records in a directory with 2000
sstables), unpatched takes 35s and patched about 150ms.
[circle|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15364]
> Avoid over scanning data directories in LogFile.verify()
> --------------------------------------------------------
>
> Key: CASSANDRA-15364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15364
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Compaction
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We currently list the data directory for every {{REMOVE}} record in the file
> in {{LogFile.verify()}} - this can get very expensive during startup when we
> call {{LogTransaction.removeUnfinishedLeftovers()}}. In
> {{LogRecord.getExistingFiles(Set<String> absoluteFilePaths)}} we also fully
> parse the file name of the sstables found, here we only need to prefix match.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]