[ceph-users] Re: Achieve no loss-of-write with 2 node failure?

Anthony D'Atri via ceph-users Tue, 30 Dec 2025 07:57:46 -0800


> * The filesystem (BlueStore or whatever) or DBMS (RocksDB or
>  whatever) is required to issue "write buffer commit" commands
>  to the device whenever necessary to ensure metadata or data is
>  not lost by cache power loss and remains consistent.


.. which is why it’s advisable to disable HDD volatile write cache.  Yet, I’ve 
seen data lost when power is lost with my own eyes. When an SSD is too cheap to 
include a supercap, I’m sure not going to trust that it implements the above 
rigorously. PLP != DRAMless.

> * _Some_ loss is inevitable: there can be data still in the
>  middle of being written if the loss happen during a "write to
>  device" operation and efficient filesystems can do pretty
>  large ones.

I did a PDU test and 2/3 of the OSDs were corrupted.  YMMV, not going to debate 
you, Jerry.

>> Just don’t waste money on 3 DWPD SKUs or RAID HBAs.
> 
> Usually but I have seen some cases where the DB/WAL SSDs really
> should have been 3 DWPD, especially if the SSD is shared among
> data-only OSDs, which is almost inevitable in case of servers
> with many HDDs. 

3DWPD and 1DWPD SSDs are usually the exact same hardware, with different 
overprovisioning. You can turn one into the other, or pick something in 
between, with software.

That said, see above where you wrote about operational false economies, 
hassling with hybrid OSDs is right up there.

>In those cases better to be safe than sorry.

Offload SSDs require paying extra for universal drive bays or rear cages.  The 
former are prone to fussy tri-mode HBAs, and the latter to too large an HDD to 
SSD ratio.

Unless you go down the dead-end of SATA SSDs.  IMHO many deployments are better 
off with monolithic HDD OSDs, using more available slots for more spindles, 
with separate index/metadata pools.  ymmv.

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Achieve no loss-of-write with 2 node failure?

Reply via email to