Hello Atin,

 
 
I just read it and very happy you found the issue. We really hope this will be 
fixed in the next 3.10.7 version!

 
 
PS: Wow nice all that c code and those "goto out" statements (not always 
considered clean but the best way often I think). Can remember the days I wrote 
kernel drivers myself in c :)

 
 
Regards

Jo Goossens

 
 

 
-----Original message-----
From:Atin Mukherjee <[email protected]>
Sent:Fri 27-10-2017 21:01
Subject:Re: [Gluster-users] BUG: After stop and start wrong port is advertised
To:Jo Goossens <[email protected]>; 
CC:[email protected]; 
 
We (finally) figured out the root cause, Jo!
 Patch https://review.gluster.org/#/c/18579 posted upstream for review.

On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <[email protected] 
<mailto:[email protected]> > wrote:
 

Hi,

 
 
We use glusterfs 3.10.5 on Debian 9.

 
When we stop or restart the service, e.g.: service glusterfs-server restart

 
We see that the wrong port get's advertised afterwards. For example:

 
Before restart:

 
Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public        49153     0          Y       6364
Brick 192.168.140.42:/gluster/public        49152     0          Y       1483
Brick 192.168.140.43:/gluster/public        49152     0          Y       5913
Self-heal Daemon on localhost               N/A       N/A        Y       5932
Self-heal Daemon on 192.168.140.42          N/A       N/A        Y       13084
Self-heal Daemon on 192.168.140.41          N/A       N/A        Y       15499
 Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
  After restart of the service on one of the nodes (192.168.140.43) the port 
seems to have changed (but it didn't):
 root@app3:/var/log/glusterfs#  gluster volume status
Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public        49153     0          Y       6364
Brick 192.168.140.42:/gluster/public        49152     0          Y       1483
Brick 192.168.140.43:/gluster/public        49154     0          Y       5913
Self-heal Daemon on localhost               N/A       N/A        Y       4628
Self-heal Daemon on 192.168.140.42          N/A       N/A        Y       3077
Self-heal Daemon on 192.168.140.41          N/A       N/A        Y       28777
 Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
  However the active process is STILL the same pid AND still listening on the 
old port
 [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster
tcp        0      0 0.0.0.0:49152 <http://0.0.0.0:49152>           0.0.0.0:*    
           LISTEN      5913/glusterfsd
  The other nodes logs fill up with errors because they can't reach the daemon 
anymore. They try to reach it on the "new" port instead of the old one:
 [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] 
0-public-client-2: connection to 192.168.140.43:49154 
<http://192.168.140.43:49154> failed (Connection refused); disconnecting socket
 So they now try 49154 instead of the old 49152 
 Is this also by design? We had a lot of issues because of this recently. We 
don't understand why it starts advertising a completely wrong port after 
stop/start.
     
Regards

Jo Goossens

 
 
 
_______________________________________________
 Gluster-users mailing list
 [email protected] <mailto:[email protected]> 
 http://lists.gluster.org/mailman/listinfo/gluster-users 
<http://lists.gluster.org/mailman/listinfo/gluster-users> 
 
_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to