>> I strongly recommend running 5 mons.  Given how you phrase the above it
>> seems as though you might mean 3 dedicated mon nodes?  If so I’d still run
>> 5 mons, add the mon cephadm host label to two of the OSD nodes.
>> 
> 
> Right now. only one mon is "active" and used all of the time. You would
> still run 5 mons anyway?

Are you conflating mons and mgrs? Mons form a quorum, you need more than half 
of them able to reach each other.

Yes I would run 5 mons, to survive a double failure.  At at least 3 mgrs.

> 
> NVME is used only for OS. I was considering offloading WAL+DB to NVME, but
> that would mean 3-5NVME per OSD/JBOD node and we do not have configuration
> like that right now.

Nothing stops you from adding those, though for many applications the benefit 
isn’t worth it.

> 
>> Do you have your index pool on SSDs?  How old are those existing OSDs?  If
>> they were deployed in the 6 TB spinner era, they may have the old
>> bluestore_min_alloc_size value baked in.  Check `ceph osd metadata`, see if
>> any are not 4KiB.  If you have or will have a significant population of
>> small RGW objects, serially redeploying the old OSDs — especially if
>> they’re Filestore — would have advantages.
>> 
> 
> Index pool is on HDD. HDDs (mostly Seagate) are actually quite old

At 6 TB that was kinda assumed ;)


>  from 2016 and lately we are seeing a bigger failure rate,

That’s very much to be expected.

> I checked bluestore_min_alloc_size and indeed there is a 4K value set for
> all of them. Even on new ones.

Groovy.  You checked with “ceph osd metadata”?

> 
> We are replacing failed disks on "same OSD id" with something like:
> 
> # destroy failed osd
> systemctl stop ceph-osd@{id}
> ceph osd down osd.{id}
> ceph osd destroy {id} --yes-i-really-mean-it
> 
> # prepare new one on same osd id
> ceph-volume lvm prepare --osd-id {id} --data /dev/sdX
> ceph-volume lvm list
> ceph-volume lvm activate {id} {osd fsid}
> 
> Is this "a problem"?

Not at all.

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to