[ 
https://issues.apache.org/jira/browse/CASSANDRA-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15364:
----------------------------------------
    Test and Documentation Plan: circle ci run
                         Status: Patch Available  (was: Open)

[patch|https://github.com/krummas/cassandra/commits/marcuse/15364] - this lists 
the files only once and improves performance of getExistingFiles by using a 
TreeSet containing the file name prefixes. In my silly laptop local benchmark 
(verifying a LogFile with 2000 remove records in a directory with 2000 
sstables), unpatched takes 35s and patched about 150ms.
[circle|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15364]

> Avoid over scanning data directories in LogFile.verify()
> --------------------------------------------------------
>
>                 Key: CASSANDRA-15364
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15364
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We currently list the data directory for every {{REMOVE}} record in the file 
> in {{LogFile.verify()}} - this can get very expensive during startup when we 
> call {{LogTransaction.removeUnfinishedLeftovers()}}. In 
> {{LogRecord.getExistingFiles(Set<String> absoluteFilePaths)}} we also fully 
> parse the file name of the sstables found, here we only need to prefix match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to