> * The filesystem (BlueStore or whatever) or DBMS (RocksDB or > whatever) is required to issue "write buffer commit" commands > to the device whenever necessary to ensure metadata or data is > not lost by cache power loss and remains consistent.
.. which is why it’s advisable to disable HDD volatile write cache. Yet, I’ve seen data lost when power is lost with my own eyes. When an SSD is too cheap to include a supercap, I’m sure not going to trust that it implements the above rigorously. PLP != DRAMless. > * _Some_ loss is inevitable: there can be data still in the > middle of being written if the loss happen during a "write to > device" operation and efficient filesystems can do pretty > large ones. I did a PDU test and 2/3 of the OSDs were corrupted. YMMV, not going to debate you, Jerry. >> Just don’t waste money on 3 DWPD SKUs or RAID HBAs. > > Usually but I have seen some cases where the DB/WAL SSDs really > should have been 3 DWPD, especially if the SSD is shared among > data-only OSDs, which is almost inevitable in case of servers > with many HDDs. 3DWPD and 1DWPD SSDs are usually the exact same hardware, with different overprovisioning. You can turn one into the other, or pick something in between, with software. That said, see above where you wrote about operational false economies, hassling with hybrid OSDs is right up there. >In those cases better to be safe than sorry. Offload SSDs require paying extra for universal drive bays or rear cages. The former are prone to fussy tri-mode HBAs, and the latter to too large an HDD to SSD ratio. Unless you go down the dead-end of SATA SSDs. IMHO many deployments are better off with monolithic HDD OSDs, using more available slots for more spindles, with separate index/metadata pools. ymmv. _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
