Hi All. I was on luminous 12.2.0 as I do *not* enable repo updates for critical
software (e.g. openstack / ceph). Upgrades need to occur on an intentional
basis!
So I first have upgraded to luminous 12.2.11 following the guide and release
notes.
[root@lvtncephx110 ~]# ceph version
ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous
(stable)
Also I have followed Eugen's advice and set the appropriate cluster flag:
ceph osd require-osd-release luminous
Now my cluster shows:
[root@lvtncephx110 ~]# ceph osd dump | grep recovery
flags sortbitwise,recovery_deletes,purged_snapdirs
I like Paul's note to perform a full deep scrub which will take some time to
complete but will ensure that all data is touched and pruned as necessary -
good fsck on each OSD:
ceph osd deep-scrub all
[root@lvtncephx110 ~]# ceph status
cluster:
id: 5fabf1b2-cfd0-44a8-a6b5-fb3fd0545517
health: HEALTH_OK
services:
mon: 3 daemons, quorum lvtncephx121,lvtncephx122,lvtncephx123
mgr: lvtncephx121(active), standbys: lvtncephx122, lvtncephx123
mds: cephfs-1/1/1 up {0=lvtncephx152=up:active}, 1 up:standby
osd: 18 osds: 18 up, 18 in
rgw: 2 daemons active
data:
pools: 23 pools, 2016 pgs
objects: 2.67M objects, 10.1TiB
usage: 20.2TiB used, 38.6TiB / 58.8TiB avail
pgs: 2011 active+clean
5 active+clean+scrubbing+deep
This means that I *could* upgrade to mimic now (at least as soon as the deep
scrub completes). However, other posts show that there could be a problem with
pglog_hardlimit and I should wait until 13.2.5
Thanks for the suggestions and I feel confident in our ability to upgrade to
Mimic within the next couple months (time to let 13.2.5 settle).
Andy
> On Feb 7, 2019, at 1:21 PM, Paul Emmerich <[email protected]> wrote:
>
> You need to run a full deep scrub before continuing the upgrade, the
> reason for this is that the deep scrub migrates the format of some
> snapshot-related on-disk data structure.
>
> Looks like you only tried a normal scrub, not a deep-scrub
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Thu, Feb 7, 2019 at 4:34 PM Eugen Block <[email protected]> wrote:
>>
>> Hi,
>>
>> could it be a missing 'ceph osd require-osd-release luminous' on your
>> cluster?
>>
>> When I check a luminous cluster I get this:
>>
>> host1:~ # ceph osd dump | grep recovery
>> flags sortbitwise,recovery_deletes,purged_snapdirs
>>
>> The flags in the code you quote seem related to that.
>> Can you check that output on your cluster?
>>
>> Found this in a thread from last year [1].
>>
>>
>> Regards,
>> Eugen
>>
>> [1] https://www.spinics.net/lists/ceph-devel/msg40191.html
>>
>> Zitat von Andrew Bruce <[email protected]>:
>>
>>> Hello All! Yesterday started upgrade from luminous to mimic with one
>>> of my 3 MONs.
>>>
>>> After applying mimic yum repo and updating - a restart reports the
>>> following error from the MON log file:
>>>
>>> ==> /var/log/ceph/ceph-mon.lvtncephx121.log <==
>>> 2019-02-07 10:02:40.110 7fc8283ed700 -1 mon.lvtncephx121@0(probing)
>>> e4 handle_probe_reply existing cluster has not completed a full
>>> luminous scrub to purge legacy snapdir objects; please scrub before
>>> upgrading beyond luminous.
>>>
>>> My question is simply: What exactly does this require?
>>>
>>> Yesterday afternoon I did a manual:
>>>
>>> ceph osd scrub all
>>>
>>> But that has zero effect. I still get the same message on restarting the MON
>>>
>>> I have no errors in the cluster except for the single MON
>>> (lvtncephx121) that I'm working to migrate to mimic first:
>>>
>>> [root@lvtncephx110 ~]# ceph status
>>> cluster:
>>> id: 5fabf1b2-cfd0-44a8-a6b5-fb3fd0545517
>>> health: HEALTH_WARN
>>> 1/3 mons down, quorum lvtncephx122,lvtncephx123
>>>
>>> services:
>>> mon: 3 daemons, quorum lvtncephx122,lvtncephx123, out of quorum:
>>> lvtncephx121
>>> mgr: lvtncephx122(active), standbys: lvtncephx123, lvtncephx121
>>> mds: cephfs-1/1/1 up {0=lvtncephx151=up:active}, 1 up:standby
>>> osd: 18 osds: 18 up, 18 in
>>> rgw: 2 daemons active
>>>
>>> data:
>>> pools: 23 pools, 2016 pgs
>>> objects: 2608k objects, 10336 GB
>>> usage: 20689 GB used, 39558 GB / 60247 GB avail
>>> pgs: 2016 active+clean
>>>
>>> io:
>>> client: 5612 B/s rd, 3756 kB/s wr, 1350 op/s rd, 412 op/s wr
>>>
>>> FWIW: The source code has the following:
>>>
>>> // Monitor.cc
>>> if (!osdmon()->osdmap.test_flag(CEPH_OSDMAP_PURGED_SNAPDIRS) ||
>>> !osdmon()->osdmap.test_flag(CEPH_OSDMAP_RECOVERY_DELETES)) {
>>> derr << __func__ << " existing cluster has not completed a
>>> full luminous"
>>> << " scrub to purge legacy snapdir objects; please scrub before"
>>> << " upgrading beyond luminous." << dendl;
>>> exit(0);
>>> }
>>> }
>>>
>>> So two question:
>>> How to show the current flags in the OSD map checked by the monitor?
>>> How to get these flags set so the MON will actually start.
>>>
>>> Thanks,
>>> Andy
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com