[jira] [Updated] (CASSANDRASC-94) Reduce filesystem calls while streaming SSTables

Francisco Guerrero (Jira) Sun, 21 Apr 2024 14:25:04 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRASC-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Francisco Guerrero updated CASSANDRASC-94:
------------------------------------------
          Fix Version/s: 1.0
    Source Control Link: 
https://github.com/apache/cassandra-sidecar/commit/4a6b8c9cfe0c6286d12c7d561941a24c25a206ef
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

> Reduce filesystem calls while streaming SSTables
> ------------------------------------------------
>
>                 Key: CASSANDRASC-94
>                 URL: https://issues.apache.org/jira/browse/CASSANDRASC-94
>             Project: Sidecar for Apache Cassandra
>          Issue Type: Improvement
>          Components: Configuration
>            Reporter: Francisco Guerrero
>            Assignee: Francisco Guerrero
>            Priority: Normal
>              Labels: pull-request-available
>             Fix For: 1.0
>
>
> When streaming snapshotted SSTables from Cassandra Sidecar, Sidecar will 
> perform multiple filesystem calls:
> - Traverse the data directories to determine the keyspace / table path
> - Once found determine if the SSTable file exists under the snapshots 
> directory
> - Read the filesystem to obtain the file type and file size
> - Read the requested range of the file and stream it
> The amount of filesystem calls is manageable for streaming a single SSTable, 
> but when a client(s) read multiple SSTables, for example in the case of 
> Cassandra Analytics bulk reads, hundred to thousand of requests are performed 
> requiring every request to perform the above system calls.
> In this improvement, it is proposed introducing several two to reduce the 
> amount of system calls while streaming SSTables:
> 1. *Cache all data file locations*: This is cached once and it will not 
> change during the lifecycle of the application. The values come from the 
> Storage Service MBean {{getAllDataFileLocations}} method.
> 2. *snapshot list cache*: to maintain a cache of recently listed snapshot 
> files under a snapshot directory. This cache avoids having to access the 
> filesystem every time a bulk read client list the snapshot directory. This is 
> a short lived cache and can be disabled if the snapshot list is expected to 
> be large.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRASC-94) Reduce filesystem calls while streaming SSTables

Reply via email to