[
https://issues.apache.org/jira/browse/CASSANDRA-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082264#comment-18082264
]
Dmitry Konstantinov commented on CASSANDRA-21393:
-------------------------------------------------
Few more docs about the technology:
*
https://semiconductor.samsung.com/news-events/tech-blog/nvme-fdp-a-promising-new-ssd-data-placement-approach/
*
https://download.semiconductor.samsung.com/resources/white-paper/getting-started-with-fdp-v4.pdf
> Investigate: Add FDP (Flexible Data Placement) hints to write paths to reduce
> SSD write amplification
> -----------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21393
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/SSTable
> Reporter: Sam Lightfoot
> Assignee: Sam Lightfoot
> Priority: Normal
>
> Cassandra's write paths (commitlog, memtable flush, compaction output, hints,
> system tables) currently issue writes to disk without any indication that
> they belong to distinct streams with very different expected lifetimes. On
> modern enterprise SSDs, this causes the device to interleave data with mixed
> deathtimes into the same physical superblock, which inflates the SSD's
> internal write amplification (typically 1.9-2.7x on enterprise NVMe under
> realistic skewed workloads) as the SSD's internal garbage collector must
> relocate still-valid pages when reclaiming space.
> NVMe Flexible Data Placement (FDP, NVMe TP 4146, ratified late 2022) lets the
> host attach an 8-bit Placement Identifier to each write, which the device
> uses to route writes from different streams into separate Reclaim Unit
> Handles (RUHs). When streams with similar deathtimes share an RUH and streams
> with different deathtimes are kept separate, the SSD's internal GC observes
> superblocks that become fully invalid as a unit, driving SSD WAF toward 1.
> Cassandra is well-positioned to benefit from FDP because its write streams
> have naturally distinct lifetime characteristics that the storage layer
> cannot infer on its own:
> * Commitlog segments are deleted on a rolling schedule decoupled from any
> SSTable.
> * Memtable flushes produce L0 SSTables with very short expected lifetimes.
> * Compaction outputs at higher levels live progressively longer.
> * Hints, system tables, and repair streams have their own distinct rewrite
> cadences.
> Mixing these at the device layer is pure SSD-WAF cost with no upside. Recent
> work in the database community (Lee et al., VLDB 2026, "How to Write to
> SSDs") demonstrates that exposing this kind of host-side workload knowledge
> to the SSD via FDP can eliminate SSD-level write amplification on commodity
> devices, with corresponding gains in throughput and SSD endurance.
> Reference: How to Write to SSDs:
> [https://www.vldb.org/pvldb/vol19/p1469-lee.pdf]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]