That is great to know, Atin. Thank you for letting me know, and I'm happy to have helped. :) I'm looking forward to 3.12.5 now!
Cheers, On Tue, Jan 23, 2018 at 10:36 AM Atin Mukherjee <[email protected]> wrote: > 3.10 doesn't have this regression, so you're safe. > > On Tue, Jan 23, 2018 at 1:28 PM, Jo Goossens <[email protected] > > wrote: > >> Hello, >> >> >> >> >> >> Will we also suffer from this regression in any of the (previously) fixed >> 3.10 releases? We kept 3.10 and hope to stay stable :/ >> >> >> >> Regards >> >> Jo >> >> >> >> >> >> -----Original message----- >> *From:* Atin Mukherjee <[email protected]> >> *Sent:* Tue 23-01-2018 05:15 >> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is >> advertised >> *To:* Alan Orth <[email protected]>; >> *CC:* Jo Goossens <[email protected]>; >> [email protected]; >> So from the logs what it looks to be a regression caused by commit >> 635c1c3 ( and the good news is that this is now fixed in release-3.12 >> branch and should be part of 3.12.5. >> >> Commit which fixes this issue: >> >> COMMIT: https://review.gluster.org/19146 committed in release-3.12 by \"Atin >> Mukherjee\" <[email protected]> with a commit message- glusterd: connect >> to an existing brick process when qourum status is NOT_APPLICABLE_QUORUM >> First of all, this patch reverts commit 635c1c3 as the same is causing a >> regression with bricks not coming up on time when a node is rebooted. This >> patch tries to fix the problem in a different way by just trying to connect >> to an existing running brick when quorum status is not applicable. >> >mainline patch : https://review.gluster.org/#/c/19134/ Change-Id: >> I0efb5901832824b1c15dcac529bffac85173e097 BUG: 1511301 Signed-off-by: Atin >> Mukherjee <[email protected]> >> >> >> >> >> On Mon, Jan 22, 2018 at 3:15 PM, Alan Orth <[email protected]> wrote: >> >> Ouch! Yes, I see two port-related fixes in the GlusterFS 3.12.3 release >> notes[0][1][2]. I've attached a tarball of all yesterday's logs from >> /var/log/glusterd on one the affected nodes (called "wingu3"). I hope >> that's what you need. >> >> [0] >> https://github.com/gluster/glusterfs/blob/release-3.12/doc/release-notes/3.12.3.md >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1507747 >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1507748 >> >> Thanks, >> >> >> On Mon, Jan 22, 2018 at 6:34 AM Atin Mukherjee <[email protected]> >> wrote: >> >> The patch was definitely there in 3.12.3. Do you have the glusterd and >> brick logs handy with you when this happened? >> >> On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <[email protected]> wrote: >> >> For what it's worth, I just updated some CentOS 7 servers from GlusterFS >> 3.12.1 to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had >> to use Mike Hulsman's script to check the daemon port against the port in >> the volume's brick info, update the port, and restart glusterd on each >> node. Luckily I only have four servers! Hoping I don't have to do this >> every time I reboot! >> >> Regards, >> >> On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <[email protected]> >> wrote: >> >> On Sat, 2 Dec 2017 at 19:29, Jo Goossens <[email protected]> >> wrote: >> >> Hello Atin, >> >> >> >> >> >> Could you confirm this should have been fixed in 3.10.8? If so we'll test >> it for sure! >> >> >> Fix should be part of 3.10.8 which is awaiting release announcement. >> >> >> >> >> >> >> Regards >> >> Jo >> >> >> >> >> >> >> -----Original message----- >> *From:* Atin Mukherjee <[email protected]> >> >> *Sent:* Mon 30-10-2017 17:40 >> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is >> advertised >> *To:* Jo Goossens <[email protected]>; >> *CC:* [email protected]; >> >> On Sat, 28 Oct 2017 at 02:36, Jo Goossens <[email protected]> >> wrote: >> >> Hello Atin, >> >> >> >> >> >> I just read it and very happy you found the issue. We really hope this >> will be fixed in the next 3.10.7 version! >> >> >> 3.10.7 - no I guess as the patch is still in review and 3.10.7 is getting >> tagged today. You’ll get this fix in 3.10.8. >> >> >> >> >> >> >> >> >> PS: Wow nice all that c code and those "goto out" statements (not always >> considered clean but the best way often I think). Can remember the days I >> wrote kernel drivers myself in c :) >> >> >> >> >> >> Regards >> >> Jo Goossens >> >> >> >> >> >> >> >> >> -----Original message----- >> *From:* Atin Mukherjee <[email protected]> >> *Sent:* Fri 27-10-2017 21:01 >> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is >> advertised >> *To:* Jo Goossens <[email protected]>; >> *CC:* [email protected]; >> >> We (finally) figured out the root cause, Jo! >> >> Patch https://review.gluster.org/#/c/18579 posted upstream for review. >> >> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens < >> [email protected]> wrote: >> >> Hi, >> >> >> >> >> >> We use glusterfs 3.10.5 on Debian 9. >> >> >> >> When we stop or restart the service, e.g.: service glusterfs-server >> restart >> >> >> >> We see that the wrong port get's advertised afterwards. For example: >> >> >> >> Before restart: >> >> >> Status of volume: public >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 192.168.140.41:/gluster/public 49153 0 Y >> 6364 >> Brick 192.168.140.42:/gluster/public 49152 0 Y >> 1483 >> Brick 192.168.140.43:/gluster/public 49152 0 Y >> 5913 >> Self-heal Daemon on localhost N/A N/A Y >> 5932 >> Self-heal Daemon on 192.168.140.42 N/A N/A Y >> 13084 >> Self-heal Daemon on 192.168.140.41 N/A N/A Y >> 15499 >> >> Task Status of Volume public >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> After restart of the service on one of the nodes (192.168.140.43) the >> port seems to have changed (but it didn't): >> >> root@app3:/var/log/glusterfs# gluster volume status >> Status of volume: public >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 192.168.140.41:/gluster/public 49153 0 Y >> 6364 >> Brick 192.168.140.42:/gluster/public 49152 0 Y >> 1483 >> Brick 192.168.140.43:/gluster/public 49154 0 Y >> 5913 >> Self-heal Daemon on localhost N/A N/A Y >> 4628 >> Self-heal Daemon on 192.168.140.42 N/A N/A Y >> 3077 >> Self-heal Daemon on 192.168.140.41 N/A N/A Y >> 28777 >> >> Task Status of Volume public >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> However the active process is STILL the same pid AND still listening on >> the old port >> >> [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster >> tcp 0 0 0.0.0.0:49152 0.0.0.0:* >> LISTEN 5913/glusterfsd >> >> >> The other nodes logs fill up with errors because they can't reach the >> daemon anymore. They try to reach it on the "new" port instead of the old >> one: >> >> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> >> So they now try 49154 instead of the old 49152 >> >> Is this also by design? We had a lot of issues because of this recently. >> We don't understand why it starts advertising a completely wrong port after >> stop/start. >> >> >> >> >> >> >> >> Regards >> >> Jo Goossens >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> -- >> - Atin (atinm) >> >> -- >> - Atin (atinm) >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> >> Alan Orth >> [email protected] >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> >> >> >> -- >> >> Alan Orth >> [email protected] >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > -- Alan Orth [email protected] https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
