Hello Atin, I just read it and very happy you found the issue. We really hope this will be fixed in the next 3.10.7 version!
PS: Wow nice all that c code and those "goto out" statements (not always considered clean but the best way often I think). Can remember the days I wrote kernel drivers myself in c :) Regards Jo Goossens -----Original message----- From:Atin Mukherjee <[email protected]> Sent:Fri 27-10-2017 21:01 Subject:Re: [Gluster-users] BUG: After stop and start wrong port is advertised To:Jo Goossens <[email protected]>; CC:[email protected]; We (finally) figured out the root cause, Jo! Patch https://review.gluster.org/#/c/18579 posted upstream for review. On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <[email protected] <mailto:[email protected]> > wrote: Hi, We use glusterfs 3.10.5 on Debian 9. When we stop or restart the service, e.g.: service glusterfs-server restart We see that the wrong port get's advertised afterwards. For example: Before restart: Status of volume: public Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.140.41:/gluster/public 49153 0 Y 6364 Brick 192.168.140.42:/gluster/public 49152 0 Y 1483 Brick 192.168.140.43:/gluster/public 49152 0 Y 5913 Self-heal Daemon on localhost N/A N/A Y 5932 Self-heal Daemon on 192.168.140.42 N/A N/A Y 13084 Self-heal Daemon on 192.168.140.41 N/A N/A Y 15499 Task Status of Volume public ------------------------------------------------------------------------------ There are no active volume tasks After restart of the service on one of the nodes (192.168.140.43) the port seems to have changed (but it didn't): root@app3:/var/log/glusterfs# gluster volume status Status of volume: public Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.140.41:/gluster/public 49153 0 Y 6364 Brick 192.168.140.42:/gluster/public 49152 0 Y 1483 Brick 192.168.140.43:/gluster/public 49154 0 Y 5913 Self-heal Daemon on localhost N/A N/A Y 4628 Self-heal Daemon on 192.168.140.42 N/A N/A Y 3077 Self-heal Daemon on 192.168.140.41 N/A N/A Y 28777 Task Status of Volume public ------------------------------------------------------------------------------ There are no active volume tasks However the active process is STILL the same pid AND still listening on the old port [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster tcp 0 0 0.0.0.0:49152 <http://0.0.0.0:49152> 0.0.0.0:* LISTEN 5913/glusterfsd The other nodes logs fill up with errors because they can't reach the daemon anymore. They try to reach it on the "new" port instead of the old one: [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 <http://192.168.140.43:49154> failed (Connection refused); disconnecting socket [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 <http://192.168.140.43:49154> failed (Connection refused); disconnecting socket [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 <http://192.168.140.43:49154> failed (Connection refused); disconnecting socket [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 <http://192.168.140.43:49154> failed (Connection refused); disconnecting socket [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 <http://192.168.140.43:49154> failed (Connection refused); disconnecting socket So they now try 49154 instead of the old 49152 Is this also by design? We had a lot of issues because of this recently. We don't understand why it starts advertising a completely wrong port after stop/start. Regards Jo Goossens _______________________________________________ Gluster-users mailing list [email protected] <mailto:[email protected]> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
