We've a fix in release-3.10 branch which is merged and should be available in the next 3.10 update.
On Wed, Nov 8, 2017 at 4:58 PM, Mike Hulsman <[email protected]> wrote: > Hi, > > This bug is hitting me hard on two different clients. > In RHGS 3.3 and on glusterfs 3.10.2 on Centos 7.4 > in once case I had 59 differences in a total of 203 bricks. > > I wrote a quick and dirty script to check all ports against the brick file > and the running process. > #!/bin/bash > > Host=`uname -n| awk -F"." '{print $1}'` > GlusterVol=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk > '{print $NF}'| awk -F"-server" '{print $1}'|sort | uniq` > Port=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print > $NF}'| awk -F"." '{print $NF}'` > > for Volumes in ${GlusterVol}; > do > cd /var/lib/glusterd/vols/${Volumes}/bricks > Bricks=`ls ${Host}*` > for Brick in ${Bricks}; > do > Onfile=`grep ^listen-port "${Brick}"` > BrickDir=`echo "${Brick}"| awk -F":" '{print $2}'| cut -c2-` > Daemon=`ps -eaf | grep "\${BrickDir}.pid" |grep -v grep | awk '{print > $NF}' | awk -F"." '{print $2}'` > #echo Onfile: ${Onfile} > #echo Daemon: ${Daemon} > if [ "${Onfile}" = "${Daemon}" ]; then > echo "OK For ${Brick}" > else > echo "!!! NOT OK For ${Brick}" > fi > done > done > > > Met vriendelijke groet, > > Mike Hulsman > > Proxy Managed Services B.V. | www.proxy.nl | Enterprise IT-Infra, Open > Source and Cloud Technology > Delftweg 128 3043 NB Rotterdam The Netherlands > <https://maps.google.com/?q=Delftweg+128+3043+NB+Rotterdam+The+Netherlands&entry=gmail&source=g> > | +31 10 307 0965 > > ------------------------------ > > *From: *"Jo Goossens" <[email protected]> > *To: *"Atin Mukherjee" <[email protected]> > *Cc: *[email protected] > *Sent: *Friday, October 27, 2017 11:06:35 PM > *Subject: *Re: [Gluster-users] BUG: After stop and start wrong port is > advertised > > RE: [Gluster-users] BUG: After stop and start wrong port is advertised > > Hello Atin, > > > > > > I just read it and very happy you found the issue. We really hope this > will be fixed in the next 3.10.7 version! > > > > > > PS: Wow nice all that c code and those "goto out" statements (not always > considered clean but the best way often I think). Can remember the days I > wrote kernel drivers myself in c :) > > > > > > Regards > > Jo Goossens > > > > > > > > > -----Original message----- > *From:* Atin Mukherjee <[email protected]> > *Sent:* Fri 27-10-2017 21:01 > *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is > advertised > *To:* Jo Goossens <[email protected]>; > *CC:* [email protected]; > We (finally) figured out the root cause, Jo! > > Patch https://review.gluster.org/#/c/18579 posted upstream for review. > > On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <[email protected] > > wrote: > > Hi, > > > > > > We use glusterfs 3.10.5 on Debian 9. > > > > When we stop or restart the service, e.g.: service glusterfs-server restart > > > > We see that the wrong port get's advertised afterwards. For example: > > > > Before restart: > > > Status of volume: public > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick 192.168.140.41:/gluster/public 49153 0 Y > 6364 > Brick 192.168.140.42:/gluster/public 49152 0 Y > 1483 > Brick 192.168.140.43:/gluster/public 49152 0 Y > 5913 > Self-heal Daemon on localhost N/A N/A Y > 5932 > Self-heal Daemon on 192.168.140.42 N/A N/A Y > 13084 > Self-heal Daemon on 192.168.140.41 N/A N/A Y > 15499 > > Task Status of Volume public > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > > After restart of the service on one of the nodes (192.168.140.43) the port > seems to have changed (but it didn't): > > root@app3:/var/log/glusterfs# gluster volume status > Status of volume: public > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick 192.168.140.41:/gluster/public 49153 0 Y > 6364 > Brick 192.168.140.42:/gluster/public 49152 0 Y > 1483 > Brick 192.168.140.43:/gluster/public 49154 0 Y > 5913 > Self-heal Daemon on localhost N/A N/A Y > 4628 > Self-heal Daemon on 192.168.140.42 N/A N/A Y > 3077 > Self-heal Daemon on 192.168.140.41 N/A N/A Y > 28777 > > Task Status of Volume public > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > > However the active process is STILL the same pid AND still listening on > the old port > > [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster > tcp 0 0 0.0.0.0:49152 0.0.0.0:* > LISTEN 5913/glusterfsd > > > The other nodes logs fill up with errors because they can't reach the > daemon anymore. They try to reach it on the "new" port instead of the old > one: > > [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] > 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection > refused); disconnecting socket > [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-public-client-2: changing port to 49154 (from 0) > [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] > 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection > refused); disconnecting socket > [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-public-client-2: changing port to 49154 (from 0) > [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] > 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection > refused); disconnecting socket > [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-public-client-2: changing port to 49154 (from 0) > [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] > 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection > refused); disconnecting socket > [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-public-client-2: changing port to 49154 (from 0) > [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] > 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection > refused); disconnecting socket > > So they now try 49154 instead of the old 49152 > > Is this also by design? We had a lot of issues because of this recently. > We don't understand why it starts advertising a completely wrong port after > stop/start. > > > > > > > > Regards > > Jo Goossens > > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users > > >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
