So from the logs what it looks to be a regression caused by commit 635c1c3 ( and the good news is that this is now fixed in release-3.12 branch and should be part of 3.12.5.
Commit which fixes this issue: COMMIT: https://review.gluster.org/19146 committed in release-3.12 by \"Atin Mukherjee\" <[email protected]> with a commit message- glusterd: connect to an existing brick process when qourum status is NOT_APPLICABLE_QUORUM First of all, this patch reverts commit 635c1c3 as the same is causing a regression with bricks not coming up on time when a node is rebooted. This patch tries to fix the problem in a different way by just trying to connect to an existing running brick when quorum status is not applicable. >mainline patch : https://review.gluster.org/#/c/19134/ Change-Id: I0efb5901832824b1c15dcac529bffac85173e097 BUG: 1511301 Signed-off-by: Atin Mukherjee <[email protected]> On Mon, Jan 22, 2018 at 3:15 PM, Alan Orth <[email protected]> wrote: > Ouch! Yes, I see two port-related fixes in the GlusterFS 3.12.3 release > notes[0][1][2]. I've attached a tarball of all yesterday's logs from > /var/log/glusterd on one the affected nodes (called "wingu3"). I hope > that's what you need. > > [0] https://github.com/gluster/glusterfs/blob/release-3.12/ > doc/release-notes/3.12.3.md > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1507747 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1507748 > > Thanks, > > > On Mon, Jan 22, 2018 at 6:34 AM Atin Mukherjee <[email protected]> > wrote: > >> The patch was definitely there in 3.12.3. Do you have the glusterd and >> brick logs handy with you when this happened? >> >> On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <[email protected]> wrote: >> >>> For what it's worth, I just updated some CentOS 7 servers from GlusterFS >>> 3.12.1 to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had >>> to use Mike Hulsman's script to check the daemon port against the port in >>> the volume's brick info, update the port, and restart glusterd on each >>> node. Luckily I only have four servers! Hoping I don't have to do this >>> every time I reboot! >>> >>> Regards, >>> >>> On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <[email protected]> >>> wrote: >>> >>>> On Sat, 2 Dec 2017 at 19:29, Jo Goossens <[email protected]> >>>> wrote: >>>> >>>>> Hello Atin, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Could you confirm this should have been fixed in 3.10.8? If so we'll >>>>> test it for sure! >>>>> >>>> >>>> Fix should be part of 3.10.8 which is awaiting release announcement. >>>> >>>> >>>>> >>>>> Regards >>>>> >>>>> Jo >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original message----- >>>>> *From:* Atin Mukherjee <[email protected]> >>>>> >>>>> *Sent:* Mon 30-10-2017 17:40 >>>>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port >>>>> is advertised >>>>> *To:* Jo Goossens <[email protected]>; >>>>> *CC:* [email protected]; >>>>> >>>>> On Sat, 28 Oct 2017 at 02:36, Jo Goossens < >>>>> [email protected]> wrote: >>>>> >>>>> Hello Atin, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I just read it and very happy you found the issue. We really hope this >>>>> will be fixed in the next 3.10.7 version! >>>>> >>>>> >>>>> 3.10.7 - no I guess as the patch is still in review and 3.10.7 is >>>>> getting tagged today. You’ll get this fix in 3.10.8. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> PS: Wow nice all that c code and those "goto out" statements (not >>>>> always considered clean but the best way often I think). Can remember the >>>>> days I wrote kernel drivers myself in c :) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> >>>>> Jo Goossens >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original message----- >>>>> *From:* Atin Mukherjee <[email protected]> >>>>> *Sent:* Fri 27-10-2017 21:01 >>>>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port >>>>> is advertised >>>>> *To:* Jo Goossens <[email protected]>; >>>>> *CC:* [email protected]; >>>>> >>>>> We (finally) figured out the root cause, Jo! >>>>> >>>>> Patch https://review.gluster.org/#/c/18579 posted upstream for review. >>>>> >>>>> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens < >>>>> [email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> We use glusterfs 3.10.5 on Debian 9. >>>>> >>>>> >>>>> >>>>> When we stop or restart the service, e.g.: service glusterfs-server >>>>> restart >>>>> >>>>> >>>>> >>>>> We see that the wrong port get's advertised afterwards. For example: >>>>> >>>>> >>>>> >>>>> Before restart: >>>>> >>>>> >>>>> Status of volume: public >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> ------------------------------------------------------------ >>>>> ------------------ >>>>> Brick 192.168.140.41:/gluster/public 49153 0 Y >>>>> 6364 >>>>> Brick 192.168.140.42:/gluster/public 49152 0 Y >>>>> 1483 >>>>> Brick 192.168.140.43:/gluster/public 49152 0 Y >>>>> 5913 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 5932 >>>>> Self-heal Daemon on 192.168.140.42 N/A N/A Y >>>>> 13084 >>>>> Self-heal Daemon on 192.168.140.41 N/A N/A Y >>>>> 15499 >>>>> >>>>> Task Status of Volume public >>>>> ------------------------------------------------------------ >>>>> ------------------ >>>>> There are no active volume tasks >>>>> >>>>> >>>>> After restart of the service on one of the nodes (192.168.140.43) the >>>>> port seems to have changed (but it didn't): >>>>> >>>>> root@app3:/var/log/glusterfs# gluster volume status >>>>> Status of volume: public >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> ------------------------------------------------------------ >>>>> ------------------ >>>>> Brick 192.168.140.41:/gluster/public 49153 0 Y >>>>> 6364 >>>>> Brick 192.168.140.42:/gluster/public 49152 0 Y >>>>> 1483 >>>>> Brick 192.168.140.43:/gluster/public 49154 0 Y >>>>> 5913 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 4628 >>>>> Self-heal Daemon on 192.168.140.42 N/A N/A Y >>>>> 3077 >>>>> Self-heal Daemon on 192.168.140.41 N/A N/A Y >>>>> 28777 >>>>> >>>>> Task Status of Volume public >>>>> ------------------------------------------------------------ >>>>> ------------------ >>>>> There are no active volume tasks >>>>> >>>>> >>>>> However the active process is STILL the same pid AND still listening >>>>> on the old port >>>>> >>>>> [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster >>>>> tcp 0 0 0.0.0.0:49152 0.0.0.0:* >>>>> LISTEN 5913/glusterfsd >>>>> >>>>> >>>>> The other nodes logs fill up with errors because they can't reach the >>>>> daemon anymore. They try to reach it on the "new" port instead of the old >>>>> one: >>>>> >>>>> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>>>> (Connection refused); disconnecting socket >>>>> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>>>> (Connection refused); disconnecting socket >>>>> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>>>> (Connection refused); disconnecting socket >>>>> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>>>> (Connection refused); disconnecting socket >>>>> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed >>>>> (Connection refused); disconnecting socket >>>>> >>>>> So they now try 49154 instead of the old 49152 >>>>> >>>>> Is this also by design? We had a lot of issues because of this >>>>> recently. We don't understand why it starts advertising a completely wrong >>>>> port after stop/start. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> >>>>> Jo Goossens >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> [email protected] >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> -- >>>>> - Atin (atinm) >>>>> >>>>> -- >>>> - Atin (atinm) >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> [email protected] >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> >>> Alan Orth >>> [email protected] >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> >> >> > > -- > > Alan Orth > [email protected] > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
