Re: [Gluster-users] BUG: After stop and start wrong port is advertised

Atin Mukherjee Fri, 22 Sep 2017 06:20:58 -0700

I've already replied to your earlier email. In case you've not seen it in
your mailbox here it goes:


This looks like a bug to me. For some reason glusterd's portmap is
referring to a stale port (IMO) where as brick is still listening to the
correct port. But ideally when glusterd service is restarted, all the
portmap in-memory is rebuilt. I'd request for the following details from
you to let us start analysing it:

1. glusterd statedump output from 192.168.140.43 . You can use kill
-SIGUSR2 <pid of glusterd> to request for a statedump and the file will be
available in /var/run/gluster
2. glusterd, brick logfile for 192.168.140.43:/gluster/public from
192.168.140.43
3. cmd_history logfile from all the nodes.
4. Content of /var/lib/glusterd/vols/public/


On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <[email protected]>
wrote:

> Hi,
>
>
>
>
>
> We use glusterfs 3.10.5 on Debian 9.
>
>
>
> When we stop or restart the service, e.g.: service glusterfs-server restart
>
>
>
> We see that the wrong port get's advertised afterwards. For example:
>
>
>
> Before restart:
>
>
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
> ------------------------------------------------------------
> ------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49152     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5932
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 13084
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 15499
>
> Task Status of Volume public
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
> After restart of the service on one of the nodes (192.168.140.43) the port
> seems to have changed (but it didn't):
>
> root@app3:/var/log/glusterfs#  gluster volume status
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
> ------------------------------------------------------------
> ------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49154     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 4628
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 3077
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 28777
>
> Task Status of Volume public
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
> However the active process is STILL the same pid AND still listening on
> the old port
>
> [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster
> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
> LISTEN      5913/glusterfsd
>
>
> The other nodes logs fill up with errors because they can't reach the
> daemon anymore. They try to reach it on the "new" port instead of the old
> one:
>
> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
>
> So they now try 49154 instead of the old 49152
>
> Is this also by design? We had a lot of issues because of this recently.
> We don't understand why it starts advertising a completely wrong port after
> stop/start.
>
>
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://lists.gluster.org/mailman/listinfo/gluster-users
>

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] BUG: After stop and start wrong port is advertised

Reply via email to