[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2022-03-18 Thread James Page
** Changed in: ceph (Ubuntu) Assignee: James Page (james-page) => (unassigned) ** Changed in: ceph (Ubuntu) Status: Incomplete => Opinion -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2021-01-11 Thread Jay Ring
That sounds promising. I replaced my node a while ago so I can't verify this one way or the other, but it certainly sounds like it may be the problem. Including why Page could not duplicate it in his new install. One of the reasons I bothered confirming the bug report was so that future

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2021-01-11 Thread Matthias Hüther
I have the same issue. That's why I've been testing a few things over the last few days: Upgrade process: Luminous -> Mimic -> Nautilus -> Octopus (All Versions run under Bionic) It doesn't matter whether I activate msgr2 or not. I always get the problem after upgrading to Octopus:

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-12-09 Thread Trent Lloyd
This issue appears to be documented here: https://docs.ceph.com/en/latest/releases/nautilus/#instructions Complete the upgrade by disallowing pre-Nautilus OSDs and enabling all new Nautilus-only functionality: # ceph osd require-osd-release nautilus Important This step is mandatory. Failure to

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-09-17 Thread madar
I am in the middle of an mimic -> nautilus -> octopus upgrade, and got the same 'tick checking mon for new map' cycle from my 15.2.3 OSD daemons. After $ ceph osd require-osd-release mimic octopus OSD-s can connect to the cluster. -- You received this bug notification because you are a member

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread Jay Ring
tail -f /var/log/ceph/ceph-osd.13.log 2020-05-22T17:27:43.909-0500 7f44708ca700 1 osd.13 46107 tick checking mon for new map 2020-05-22T17:28:14.825-0500 7f44708ca700 1 osd.13 46107 tick checking mon for new map 2020-05-22T17:28:44.838-0500 7f44708ca700 1 osd.13 46107 tick checking mon for

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread Jay Ring
/etc/ceph/ceph.conf mon host = 192.168.120.1 192.168.120.2 192.168.120.3 ceph mon dump: epoch 7 fsid last_changed 2020-05-16T23:16:32.234657-0500 created 2016-04-08T10:30:10.123758-0500 min_mon_release 15 (octopus) 0: [v2:192.168.120.1:3300/0,v1:192.168.120.1:6789/0] mon.temple-h1 1:

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
For example, the test deployment I have uses: mon_host = 10.5.0.8,10.5.0.5,10.5.0.19 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874939 Title: ceph-osd can't connect after upgrade to focal To

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
To confirm: tcp0 0 10.5.0.8:3300 0.0.0.0:* LISTEN 64045 27128 784/ceph-mon tcp0 0 10.5.0.8:6789 0.0.0.0:* LISTEN 64045 27129 784/ceph-mon 3300 == v2 6789 == v1 -- You received

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
https://docs.ceph.com/docs/master/rados/configuration/msgr2 /#transitioning-from-v1-only-to-v2-plus-v1 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874939 Title: ceph-osd can't connect after

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
Something was tickling my brain about upgrades that we dealt with in the ceph charms a while back. The MON's can run v1 and v2 messenger ports however if a port is specified in mon hosts in ceph.conf its possible that the v2 port is disable, which is why the OSD can't connect back to the cluster.

Re: [Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
On Fri, May 22, 2020 at 11:25 AM Jay Ring <1874...@bugs.launchpad.net> wrote: > "However it should be possible to complete the do-release-upgrade to the > point of requesting a reboot - don't - drop to the CLI and get all > machines to this point and then: > > restart the mons across all three

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread Jay Ring
"As a side note - even if there is a bug here (and it sounds like there might be) I would recommend placing the mon and mgr daemons in LXD containers ontop of the machines hosting the osd's" Yes. I would strongly suggest doing this also. That is how Ceph now recommends it anyway. However,

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread Jay Ring
"However it should be possible to complete the do-release-upgrade to the point of requesting a reboot - don't - drop to the CLI and get all machines to this point and then: restart the mons across all three machines restart the mgrs across all three machines restart the osds across all

Re: [Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
Hi Christian On Fri, May 22, 2020 at 8:10 AM Christian Huebner < 1874...@bugs.launchpad.net> wrote: > i filed this bug specifically for hyperconverged environments. Upgrading > monitor nodes first and then upgrading separate OSD nodes is probably > doable, but in a hyperconverged environment you

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread Christian Huebner
i filed this bug specifically for hyperconverged environments. Upgrading monitor nodes first and then upgrading separate OSD nodes is probably doable, but in a hyperconverged environment you can not separate. I tried do-release-upgrade (a couple of times) without rebooting at the end, but found

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
Other ideas - please could impacted users validate networking esp MTU configuration between machines in their cluster before, during and post upgrade. Ceph can be very sensitive to MTU mismatches and just hang when stuff is not quite right. -- You received this bug notification because you are

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
Marking 'Incomplete' for now as unable to reproduce. ** Changed in: ceph (Ubuntu) Status: In Progress => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874939 Title: ceph-osd

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-22 Thread James Page
Testing phase 2 - three machine all-in-one deploy. Deployed using eoan - mon,mgr and 1 x osd on each machine Deployment seeded with pools a lightweight test data - rbd's in each pool. Each machine upgraded in turn (1,2 and then 0) using do-release-upgrade. ceph versions checked throughout

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-21 Thread James Page
As a side note - even if there is a bug here (and it sounds like there might be) I would recommend placing the mon and mgr daemons in LXD containers ontop of the machines hosting the osd's - this will allow you to manage them independently from an upgrade process for both ceph upgrades and ubuntu

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-21 Thread James Page
OK further fact discovery from my testing. I have a 6 machine cluster deployed - three machines host mon,mgr and three machines host osd. Upgrading the mon,mgr cluster first followed by the three osd machine using do-release-upgrade and allowing the tool to reboot the machine at the end resulted

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-21 Thread Jay Ring
You may need more than one node to reproduce the problem. I had a 3 node system. I ran do-release-upgrade on node 1. The OSDs on node 1 connected to the monitor quorum, which had un- upgraded monitors on hosts 2 & 3. The upgraded OSDs on node 1 immediately died and could not be revived. --

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-21 Thread James Page
ceph-mon eoan->focal upgrade testing ceph-mon@`hostname` systemd units not restarted until reboot step of the upgrade process on each node; mixed version cluster operated as expected as each mon was upgraded. -- You received this bug notification because you are a member of Ubuntu Bugs, which

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-21 Thread James Page
working on reproduction for debug and triage. ** Changed in: ceph (Ubuntu) Status: Confirmed => In Progress ** Changed in: ceph (Ubuntu) Assignee: (unassigned) => James Page (james-page) -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-18 Thread Jay Ring
Just writing in to confirm this bug. It's very serious. Lost a whole node. No real warning. Extremely frustrating. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874939 Title: ceph-osd can't

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-18 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: ceph (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874939 Title:

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-05-05 Thread Christian Huebner
I accomplished the upgrade by marking all Ceph packages held, then digging myself through the dependency jungle to upgrade the packages subsequently. This obviously is not a production ready way to do so, but at least Ceph Octopus is running in 20.04 now now. This really needs to be fixed. --

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-04-30 Thread Christian Huebner
One note on importance: If someone runs do-release-upgrade on a converged Ceph node, it will destroy the node. So far I have not seen any recovery procedure. The only reason I was able to rapidly redo the upgrade is because it runs on snapshots and thus can be recovered after destruction. This is

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-04-30 Thread Christian Huebner
I tried to do the upgrade by hand (disable all the services that can not be autostarted, do the upgrade (btw, a manpage has been moved from ceph- deploy to ceph-base and thus the apt upgrade fails. do-release-upgrade is using --force-overwrite for this, but that's not a clean solution). Solution

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-04-30 Thread Christian Huebner
I just shut down Ceph on all four nodes completely, then did the do- release-upgrade. Before the upgrade I verified that all Ceph services were down so I would be able to start them in the correct order. After the upgrade (without reboot!) I found that all Ceph services on all Ceph nodes had been

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-04-29 Thread Christian Huebner
I redid the whole upgrade: * do-release-upgrade and finished without reboot (all 4 nodes) ** so ceph daemons should not have been restarted * restarted all ceph mons sequentially ** verified I get octopus as min mon release * restarted all ceph-mgrs sequentially ** verified that all ceph-mgr

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-04-27 Thread Dan Hill
The same guidelines apply to hyper-converged architectures. Package updates are not applied until their corresponding service restarts. Ceph packaging does not automatically restart any services. This is by design so you can safely install on a hyper-converged host, and then control the order in

Re: [Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-04-27 Thread Christian Huebner
This would work If all nodes have a single function only (mon, mgr, old). I tried everything to update the monitors first, but due to the dependencies between the Ceph packages the monitors and mgr daemons can not simply be updated separately from the OSDs What I don't get, though, is that once

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-04-24 Thread Dan Hill
Eoan packages Nautilus, while Focal packages Octopus: ceph | 14.2.2-0ubuntu3 | eoan ceph | 14.2.4-0ubuntu0.19.10.2 | eoan-security ceph | 14.2.8-0ubuntu0.19.10.1 | eoan-updates ceph | 15.2.1-0ubuntu1 | focal ceph | 15.2.1-0ubuntu2 |