[ 
https://issues.apache.org/jira/browse/CASSANDRA-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082264#comment-18082264
 ] 

Dmitry Konstantinov edited comment on CASSANDRA-21393 at 5/20/26 9:24 AM:
--------------------------------------------------------------------------

Few more resources about the technology:
* 
https://semiconductor.samsung.com/news-events/tech-blog/nvme-fdp-a-promising-new-ssd-data-placement-approach/
* 
https://download.semiconductor.samsung.com/resources/white-paper/getting-started-with-fdp-v4.pdf
* https://www.youtube.com/watch?v=c8Jw_WANn6A


was (Author: dnk):
Few more docs about the technology:
* 
https://semiconductor.samsung.com/news-events/tech-blog/nvme-fdp-a-promising-new-ssd-data-placement-approach/
* 
https://download.semiconductor.samsung.com/resources/white-paper/getting-started-with-fdp-v4.pdf

> Investigate: Add FDP (Flexible Data Placement) hints to write paths to reduce 
> SSD write amplification
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21393
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21393
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Sam Lightfoot
>            Assignee: Sam Lightfoot
>            Priority: Normal
>
> Cassandra's write paths (commitlog, memtable flush, compaction output, hints, 
> system tables) currently issue writes to disk without any indication that 
> they belong to distinct streams with very different expected lifetimes. On 
> modern enterprise SSDs, this causes the device to interleave data with mixed 
> deathtimes into the same physical superblock, which inflates the SSD's 
> internal write amplification (typically 1.9-2.7x on enterprise NVMe under 
> realistic skewed workloads) as the SSD's internal garbage collector must 
> relocate still-valid pages when reclaiming space.
> NVMe Flexible Data Placement (FDP, NVMe TP 4146, ratified late 2022) lets the 
> host attach an 8-bit Placement Identifier to each write, which the device 
> uses to route writes from different streams into separate Reclaim Unit 
> Handles (RUHs). When streams with similar deathtimes share an RUH and streams 
> with different deathtimes are kept separate, the SSD's internal GC observes 
> superblocks that become fully invalid as a unit, driving SSD WAF toward 1.
> Cassandra is well-positioned to benefit from FDP because its write streams 
> have naturally distinct lifetime characteristics that the storage layer 
> cannot infer on its own:
>  * Commitlog segments are deleted on a rolling schedule decoupled from any 
> SSTable.
>  * Memtable flushes produce L0 SSTables with very short expected lifetimes.
>  * Compaction outputs at higher levels live progressively longer.
>  * Hints, system tables, and repair streams have their own distinct rewrite 
> cadences.
> Mixing these at the device layer is pure SSD-WAF cost with no upside. Recent 
> work in the database community (Lee et al., VLDB 2026, "How to Write to 
> SSDs") demonstrates that exposing this kind of host-side workload knowledge 
> to the SSD via FDP can eliminate SSD-level write amplification on commodity 
> devices, with corresponding gains in throughput and SSD endurance.
> Reference: How to Write to SSDs: 
> [https://www.vldb.org/pvldb/vol19/p1469-lee.pdf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to