Re: [Gluster-users] BUG: After stop and start wrong port is advertised

Alan Orth Tue, 23 Jan 2018 02:12:27 -0800

That is great to know, Atin. Thank you for letting me know, and I'm happy
to have helped. :) I'm looking forward to 3.12.5 now!


Cheers,

On Tue, Jan 23, 2018 at 10:36 AM Atin Mukherjee <[email protected]> wrote:

> 3.10 doesn't have this regression, so you're safe.
>
> On Tue, Jan 23, 2018 at 1:28 PM, Jo Goossens <[email protected]
> > wrote:
>
>> Hello,
>>
>>
>>
>>
>>
>> Will we also suffer from this regression in any of the (previously) fixed
>> 3.10 releases? We kept 3.10 and hope to stay stable :/
>>
>>
>>
>> Regards
>>
>> Jo
>>
>>
>>
>>
>>
>> -----Original message-----
>> *From:* Atin Mukherjee <[email protected]>
>> *Sent:* Tue 23-01-2018 05:15
>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
>> advertised
>> *To:* Alan Orth <[email protected]>;
>> *CC:* Jo Goossens <[email protected]>;
>> [email protected];
>> So from the logs what it looks to be a regression caused by commit
>> 635c1c3 ( and the good news is that this is now fixed in release-3.12
>> branch and should be part of 3.12.5.
>>
>> Commit which fixes this issue:
>>
>> COMMIT: https://review.gluster.org/19146 committed in release-3.12 by \"Atin 
>> Mukherjee\" <[email protected]> with a commit message- glusterd: connect 
>> to an existing brick process when qourum status is NOT_APPLICABLE_QUORUM  
>> First of all, this patch reverts commit 635c1c3 as the same is causing a 
>> regression with bricks not coming up on time when a node is rebooted. This 
>> patch tries to fix the problem in a different way by just trying to connect 
>> to an existing running brick when quorum status is not applicable.  
>> >mainline patch : https://review.gluster.org/#/c/19134/  Change-Id: 
>> I0efb5901832824b1c15dcac529bffac85173e097 BUG: 1511301 Signed-off-by: Atin 
>> Mukherjee <[email protected]>
>>
>>
>>
>>
>> On Mon, Jan 22, 2018 at 3:15 PM, Alan Orth <[email protected]> wrote:
>>
>> Ouch! Yes, I see two port-related fixes in the GlusterFS 3.12.3 release
>> notes[0][1][2]. I've attached a tarball of all yesterday's logs from
>> /var/log/glusterd on one the affected nodes (called "wingu3"). I hope
>> that's what you need.
>>
>> [0]
>> https://github.com/gluster/glusterfs/blob/release-3.12/doc/release-notes/3.12.3.md
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1507747
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1507748
>>
>> Thanks,
>>
>>
>> On Mon, Jan 22, 2018 at 6:34 AM Atin Mukherjee <[email protected]>
>> wrote:
>>
>> The patch was definitely there in 3.12.3. Do you have the glusterd and
>> brick logs handy with you when this happened?
>>
>> On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <[email protected]> wrote:
>>
>> For what it's worth, I just updated some CentOS 7 servers from GlusterFS
>> 3.12.1 to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had
>> to use Mike Hulsman's script to check the daemon port against the port in
>> the volume's brick info, update the port, and restart glusterd on each
>> node. Luckily I only have four servers! Hoping I don't have to do this
>> every time I reboot!
>>
>> Regards,
>>
>> On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <[email protected]>
>> wrote:
>>
>> On Sat, 2 Dec 2017 at 19:29, Jo Goossens <[email protected]>
>> wrote:
>>
>> Hello Atin,
>>
>>
>>
>>
>>
>> Could you confirm this should have been fixed in 3.10.8? If so we'll test
>> it for sure!
>>
>>
>> Fix should be part of 3.10.8 which is awaiting release announcement.
>>
>>
>>
>>
>>
>>
>> Regards
>>
>> Jo
>>
>>
>>
>>
>>
>>
>> -----Original message-----
>> *From:* Atin Mukherjee <[email protected]>
>>
>> *Sent:* Mon 30-10-2017 17:40
>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
>> advertised
>> *To:* Jo Goossens <[email protected]>;
>> *CC:* [email protected];
>>
>> On Sat, 28 Oct 2017 at 02:36, Jo Goossens <[email protected]>
>> wrote:
>>
>> Hello Atin,
>>
>>
>>
>>
>>
>> I just read it and very happy you found the issue. We really hope this
>> will be fixed in the next 3.10.7 version!
>>
>>
>> 3.10.7 - no I guess as the patch is still in review and 3.10.7 is getting
>> tagged today. You’ll get this fix in 3.10.8.
>>
>>
>>
>>
>>
>>
>>
>>
>> PS: Wow nice all that c code and those "goto out" statements (not always
>> considered clean but the best way often I think). Can remember the days I
>> wrote kernel drivers myself in c :)
>>
>>
>>
>>
>>
>> Regards
>>
>> Jo Goossens
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original message-----
>> *From:* Atin Mukherjee <[email protected]>
>> *Sent:* Fri 27-10-2017 21:01
>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
>> advertised
>> *To:* Jo Goossens <[email protected]>;
>> *CC:* [email protected];
>>
>> We (finally) figured out the root cause, Jo!
>>
>> Patch https://review.gluster.org/#/c/18579 posted upstream for review.
>>
>> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <
>> [email protected]> wrote:
>>
>> Hi,
>>
>>
>>
>>
>>
>> We use glusterfs 3.10.5 on Debian 9.
>>
>>
>>
>> When we stop or restart the service, e.g.: service glusterfs-server
>> restart
>>
>>
>>
>> We see that the wrong port get's advertised afterwards. For example:
>>
>>
>>
>> Before restart:
>>
>>
>> Status of volume: public
>> Gluster process                             TCP Port  RDMA Port  Online
>>  Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 192.168.140.41:/gluster/public        49153     0          Y
>> 6364
>> Brick 192.168.140.42:/gluster/public        49152     0          Y
>> 1483
>> Brick 192.168.140.43:/gluster/public        49152     0          Y
>> 5913
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 5932
>> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
>> 13084
>> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
>> 15499
>>
>> Task Status of Volume public
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>> After restart of the service on one of the nodes (192.168.140.43) the
>> port seems to have changed (but it didn't):
>>
>> root@app3:/var/log/glusterfs#  gluster volume status
>> Status of volume: public
>> Gluster process                             TCP Port  RDMA Port  Online
>>  Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 192.168.140.41:/gluster/public        49153     0          Y
>> 6364
>> Brick 192.168.140.42:/gluster/public        49152     0          Y
>> 1483
>> Brick 192.168.140.43:/gluster/public        49154     0          Y
>> 5913
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 4628
>> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
>> 3077
>> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
>> 28777
>>
>> Task Status of Volume public
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>> However the active process is STILL the same pid AND still listening on
>> the old port
>>
>> [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster
>> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
>> LISTEN      5913/glusterfsd
>>
>>
>> The other nodes logs fill up with errors because they can't reach the
>> daemon anymore. They try to reach it on the "new" port instead of the old
>> one:
>>
>> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>> refused); disconnecting socket
>> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>> 0-public-client-2: changing port to 49154 (from 0)
>> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>> refused); disconnecting socket
>> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>> 0-public-client-2: changing port to 49154 (from 0)
>> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>> refused); disconnecting socket
>> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>> 0-public-client-2: changing port to 49154 (from 0)
>> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>> refused); disconnecting socket
>> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>> 0-public-client-2: changing port to 49154 (from 0)
>> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>> refused); disconnecting socket
>>
>> So they now try 49154 instead of the old 49152
>>
>> Is this also by design? We had a lot of issues because of this recently.
>> We don't understand why it starts advertising a completely wrong port after
>> stop/start.
>>
>>
>>
>>
>>
>>
>>
>> Regards
>>
>> Jo Goossens
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> --
>> - Atin (atinm)
>>
>> --
>> - Atin (atinm)
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> --
>>
>> Alan Orth
>> [email protected]
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>>
>>
>>
>> --
>>
>> Alan Orth
>> [email protected]
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>

-- 

Alan Orth
[email protected]
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] BUG: After stop and start wrong port is advertised

Reply via email to