Re: [Gluster-users] BUG: After stop and start wrong port is advertised

Jo Goossens Tue, 23 Jan 2018 00:04:43 -0800

Hello,

 
 
Will we also suffer from this regression in any of the (previously) fixed 3.10 
releases? We kept 3.10 and hope to stay stable :/




Regards

Jo

 
 
-----Original message-----
From:Atin Mukherjee <[email protected]>
Sent:Tue 23-01-2018 05:15
Subject:Re: [Gluster-users] BUG: After stop and start wrong port is advertised
To:Alan Orth <[email protected]>; 
CC:Jo Goossens <[email protected]>; [email protected]; 
 
So from the logs what it looks to be a regression caused by commit 635c1c3 ( 
and the good news is that this is now fixed in release-3.12 branch and should 
be part of 3.12.5.
 Commit which fixes this issue:



COMMIT: https://review.gluster.org/19146 committed in release-3.12 by \"Atin 
Mukherjee\" <[email protected] <mailto:[email protected]> > with a commit 
message- glusterd: connect to an existing brick process when qourum status is 
NOT_APPLICABLE_QUORUM  First of all, this patch reverts commit 635c1c3 as the 
same is causing a regression with bricks not coming up on time when a node is 
rebooted. This patch tries to fix the problem in a different way by just trying 
to connect to an existing running brick when quorum status is not applicable.  
>mainline patch : https://review.gluster.org/#/c/19134/  Change-Id: 
I0efb5901832824b1c15dcac529bffac85173e097 BUG: 1511301 Signed-off-by: Atin 
Mukherjee <[email protected] <mailto:[email protected]> >







On Mon, Jan 22, 2018 at 3:15 PM, Alan Orth <[email protected] 
<mailto:[email protected]> > wrote:
Ouch! Yes, I see two port-related fixes in the GlusterFS 3.12.3 release 
notes[0][1][2]. I've attached a tarball of all yesterday's logs from 
/var/log/glusterd on one the affected nodes (called "wingu3"). I hope that's 
what you need.

[0] 
https://github.com/gluster/glusterfs/blob/release-3.12/doc/release-notes/3.12.3.md
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1507747
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1507748
 Thanks,


On Mon, Jan 22, 2018 at 6:34 AM Atin Mukherjee <[email protected] 
<mailto:[email protected]> > wrote:
The patch was definitely there in 3.12.3. Do you have the glusterd and brick 
logs handy with you when this happened?

On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <[email protected] 
<mailto:[email protected]> > wrote:
For what it's worth, I just updated some CentOS 7 servers from GlusterFS 3.12.1 
to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had to use 
Mike Hulsman's script to check the daemon port against the port in the volume's 
brick info, update the port, and restart glusterd on each node. Luckily I only 
have four servers! Hoping I don't have to do this every time I reboot!
 Regards,

On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <[email protected] 
<mailto:[email protected]> > wrote:
On Sat, 2 Dec 2017 at 19:29, Jo Goossens <[email protected] 
<mailto:[email protected]> > wrote:
 

Hello Atin,

 
 
Could you confirm this should have been fixed in 3.10.8? If so we'll test it 
for sure!

 Fix should be part of 3.10.8 which is awaiting release announcement.
  


Regards

Jo

 

 
-----Original message-----
From:Atin Mukherjee <[email protected] <mailto:[email protected]> >
Sent:Mon 30-10-2017 17:40
Subject:Re: [Gluster-users] BUG: After stop and start wrong port is advertised
To:Jo Goossens <[email protected] 
<mailto:[email protected]> >; 
CC:[email protected] <mailto:[email protected]> ; 
 

On Sat, 28 Oct 2017 at 02:36, Jo Goossens <[email protected] 
<mailto:[email protected]> > wrote:
 

Hello Atin,

 
 
I just read it and very happy you found the issue. We really hope this will be 
fixed in the next 3.10.7 version!

 3.10.7 - no I guess as the patch is still in review and 3.10.7 is getting 
tagged today. You’ll get this fix in 3.10.8. 
  
 
 
PS: Wow nice all that c code and those "goto out" statements (not always 
considered clean but the best way often I think). Can remember the days I wrote 
kernel drivers myself in c :)

 
 
Regards

Jo Goossens

 
 

 
-----Original message-----
From:Atin Mukherjee <[email protected] <mailto:[email protected]> >
Sent:Fri 27-10-2017 21:01
Subject:Re: [Gluster-users] BUG: After stop and start wrong port is advertised
To:Jo Goossens <[email protected] 
<mailto:[email protected]> >; 
CC:[email protected] <mailto:[email protected]> ; 
 
We (finally) figured out the root cause, Jo!
 Patch https://review.gluster.org/#/c/18579 posted upstream for review.

On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <[email protected] 
<mailto:[email protected]> > wrote:
 

Hi,

 
 
We use glusterfs 3.10.5 on Debian 9.

 
When we stop or restart the service, e.g.: service glusterfs-server restart

 
We see that the wrong port get's advertised afterwards. For example:

 
Before restart:

 
Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public        49153     0          Y       6364
Brick 192.168.140.42:/gluster/public        49152     0          Y       1483
Brick 192.168.140.43:/gluster/public        49152     0          Y       5913
Self-heal Daemon on localhost               N/A       N/A        Y       5932
Self-heal Daemon on 192.168.140.42          N/A       N/A        Y       13084
Self-heal Daemon on 192.168.140.41          N/A       N/A        Y       15499
 Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
  After restart of the service on one of the nodes (192.168.140.43) the port 
seems to have changed (but it didn't):
 root@app3:/var/log/glusterfs#  gluster volume status
Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public        49153     0          Y       6364
Brick 192.168.140.42:/gluster/public        49152     0          Y       1483
Brick 192.168.140.43:/gluster/public        49154     0          Y       5913
Self-heal Daemon on localhost               N/A       N/A        Y       4628
Self-heal Daemon on 192.168.140.42          N/A       N/A        Y       3077
Self-heal Daemon on 192.168.140.41          N/A       N/A        Y       28777
 Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
  However the active process is STILL the same pid AND still listening on the 
old port
 [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster
tcp        0      0 0.0.0.0:49152 <http://0.0.0.0:49152>           0.0.0.0:*    
           LISTEN      5913/glusterfsd
  The other nodes logs fill up with errors because they can't reach the daemon 
anymore. They try to reach it on the "new" port instead of the old one:
 [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
 So they now try 49154 instead of the old 49152 
 Is this also by design? We had a lot of issues because of this recently. We 
don't understand why it starts advertising a completely wrong port after 
stop/start.
     
Regards

Jo Goossens

 
 
 
_______________________________________________
 Gluster-users mailing list
 [email protected] <mailto:[email protected]> 
 http://lists.gluster.org/mailman/listinfo/gluster-users 
<http://lists.gluster.org/mailman/listinfo/gluster-users> 
 
--
- Atin (atinm)
 
-- 
- Atin (atinm)
 _______________________________________________
 Gluster-users mailing list
 [email protected] <mailto:[email protected]> 
 http://lists.gluster.org/mailman/listinfo/gluster-users 
<http://lists.gluster.org/mailman/listinfo/gluster-users> 


-- 
Alan Orth
 [email protected] <mailto:[email protected]> 
 https://picturingjordan.com
 https://englishbulgaria.net
 https://mjanja.ch

 
 


-- 
Alan Orth
 [email protected] <mailto:[email protected]> 
 https://picturingjordan.com
 https://englishbulgaria.net
 https://mjanja.ch

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] BUG: After stop and start wrong port is advertised

Reply via email to