+1 for Gregory's response.  With filestore, if you lost a journal SSD and
followed the steps you outlined, you are leaving yourself open to corrupt
data.  Any write that was ack'd by the journal, but not flushed to the disk
would be lost and assumed to be there by the cluster.  With a failed
journal SSD on filestore, you should have removed all affected OSDs before
re-adding them with a new journal device.  The same is true of Bluestore.

Where Bluestore differs from Filestore is if your SSD stops receiving
writes and can still be read (or any time you can still read from the SSD
and are swapping it out).  You would be able to flush the journal and
create new journals on a new SSD for the OSDs.  This is not possible with
Bluestore as you cannot modify the WAL or RocksDB portions of a Bluestore
OSD after creation.  If you started with your RocksDB and WAL on an SSD,
you could not decide to add an NVME later to move the WAL to without
removing and re-creating the OSDs with the new configuration.

On Mon, Jan 29, 2018 at 10:58 AM Gregory Farnum <[email protected]> wrote:

> On Mon, Jan 29, 2018 at 9:37 AM Vladimir Prokofev <[email protected]> wrote:
>
>> Hello.
>>
>> In short: what are the consequence of loosing external WAL/DB
>> device(assuming it’s SSD) in bluestore?
>>
>> In comparison with filestore - we used to have an external SSD for
>> journaling multiple HDD OSDs. Hardware failure of such a device would not
>> be that big of a deal, as we can quickly use xfs_repair to initialize a new
>> journal. You don't have to redeploy OSDs, just provide them with a new
>> journal device, remount XFS, and restart osd process so it can quickly
>> update its state. Healthy state can be restored in a matter of minutes.
>>
>> That was with filestore.
>> Now what's the situation with bluestore?
>>
>> What will happen in different scenarios. like having only WAL on external
>> device, or DB, or both WAL+DB?
>> I kind of assume that loosing DB means losing OSD, and it has to be
>> redeployed?
>>
>
> I'll let the BlueStore guys speak to this more directly, but I believe you
> lose the OSD.
>
> However, let's be clear: this is not really a different situation than
> with FileStore. You *can* with FileStore fix the xfs filesystem and
> persuade the OSD to start up again by giving it a new journal. But this is
> a *lie* to the OSD about the state of its data and is very likely to
> introduce data loss or inconsistencies. You shouldn't do it unless the OSD
> hosts the only copy of a PG in your cluster.
> -Greg
>
>
>> What about WAL? Any specific commands to restore it, similar to
>> xfs_repair?
>> I didn't find any docs regarding this matter, but maybe I'm doing it
>> badly, so a link to such doc would be great.
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to