I just upgraded to Luminous yesterday and before the upgrade was complete,
we had SSD OSDs flapping up and down and scrub errors in the RGW index
pools.  I consistently made sure that we had all OSDs back up and the
cluster healthy before continuing and never reduced the min_size below 2
for the pools on the NVMes.  The RGW daemons for our 2 multi-site realms
restarted themselves (due to a long-standing memory leak supposedly fixed
in 12.2.2) and prematurely upgraded themselves before all of the OSDs had
been upgraded and I thought that was the reason for the scrub errors and
inconsistent PGs... however this morning I had a scrub error in our local
only realm which does not use multi-site and had not restarted any of it's
RGW daemons until after all of the OSDs had been upgraded.

Is there anything we should be looking at for this?  Any idea what could be
causing these scrub errors?  I can issue a repair on the PG and the scrub
errors go away, but then they keep coming back on the same PGs later.  I
can also issue a deep-scrub on every PG in these pools and they return
clean, but then later show back up with the scrub errors and inconsistent
PGs on the same PGs.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to