Thanks. good to know. Met vriendelijke groet,
Mike Hulsman Proxy Managed Services B.V. | www.proxy.nl | Enterprise IT-Infra, Open Source and Cloud Technology Delftweg 128 3043 NB Rotterdam The Netherlands | +31 10 307 0965 > From: "Atin Mukherjee" <[email protected]> > To: "Mike Hulsman" <[email protected]> > Cc: "Jo Goossens" <[email protected]>, "gluster-users" > <[email protected]> > Sent: Wednesday, November 8, 2017 2:12:02 PM > Subject: Re: [Gluster-users] BUG: After stop and start wrong port is > advertised > We've a fix in release-3.10 branch which is merged and should be available in > the next 3.10 update. > On Wed, Nov 8, 2017 at 4:58 PM, Mike Hulsman < [email protected] > wrote: >> Hi, >> This bug is hitting me hard on two different clients. >> In RHGS 3.3 and on glusterfs 3.10.2 on Centos 7.4 >> in once case I had 59 differences in a total of 203 bricks. >> I wrote a quick and dirty script to check all ports against the brick file >> and >> the running process. >> #!/bin/bash >> Host=`uname -n| awk -F"." '{print $1}'` >> GlusterVol=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print >> $NF}'| awk -F"-server" '{print $1}'|sort | uniq` >> Port=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print $NF}'| >> awk >> -F"." '{print $NF}'` >> for Volumes in ${GlusterVol}; >> do >> cd /var/lib/glusterd/vols/${Volumes}/bricks >> Bricks=`ls ${Host}*` >> for Brick in ${Bricks}; >> do >> Onfile=`grep ^listen-port "${Brick}"` >> BrickDir=`echo "${Brick}"| awk -F":" '{print $2}'| cut -c2-` >> Daemon=`ps -eaf | grep "\${BrickDir}.pid" |grep -v grep | awk '{print $NF}' | >> awk -F"." '{print $2}'` >> #echo Onfile: ${Onfile} >> #echo Daemon: ${Daemon} >> if [ "${Onfile}" = "${Daemon}" ]; then >> echo "OK For ${Brick}" >> else >> echo "!!! NOT OK For ${Brick}" >> fi >> done >> done >> Met vriendelijke groet, >> Mike Hulsman >> Proxy Managed Services B.V. | www.proxy.nl | Enterprise IT-Infra, Open Source >> and Cloud Technology >> Delftweg 128 3043 NB Rotterdam The Netherlands | +31 10 307 0965 >>> From: "Jo Goossens" < [email protected] > >>> To: "Atin Mukherjee" < [email protected] > >>> Cc: [email protected] >>> Sent: Friday, October 27, 2017 11:06:35 PM >>> Subject: Re: [Gluster-users] BUG: After stop and start wrong port is >>> advertised >>> RE: [Gluster-users] BUG: After stop and start wrong port is advertised >>> Hello Atin, >>> I just read it and very happy you found the issue. We really hope this will >>> be >>> fixed in the next 3.10.7 version! >>> PS: Wow nice all that c code and those "goto out" statements (not always >>> considered clean but the best way often I think). Can remember the days I >>> wrote >>> kernel drivers myself in c :) >>> Regards >>> Jo Goossens >>>> -----Original message----- >>>> From: Atin Mukherjee < [email protected] > >>>> Sent: Fri 27-10-2017 21:01 >>>> Subject: Re: [Gluster-users] BUG: After stop and start wrong port is >>>> advertised >>>> To: Jo Goossens < [email protected] >; >>>> CC: [email protected] ; >>>> We (finally) figured out the root cause, Jo! >>>> Patch https://review.gluster.org/#/c/18579 posted upstream for review. >>>> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens < >>>> [email protected] > >>>> wrote: >>>>> Hi, >>>>> We use glusterfs 3.10.5 on Debian 9. >>>>> When we stop or restart the service, e.g.: service glusterfs-server >>>>> restart >>>>> We see that the wrong port get's advertised afterwards. For example: >>>>> Before restart: >>>>> Status of volume: public >>>>> Gluster process TCP Port RDMA Port Online Pid >>>>> ------------------------------------------------------------------------------ >>>>> Brick 192.168.140.41:/gluster/public 49153 0 Y 6364 >>>>> Brick 192.168.140.42:/gluster/public 49152 0 Y 1483 >>>>> Brick 192.168.140.43:/gluster/public 49152 0 Y 5913 >>>>> Self-heal Daemon on localhost N/A N/A Y 5932 >>>>> Self-heal Daemon on 192.168.140.42 N/A N/A Y 13084 >>>>> Self-heal Daemon on 192.168.140.41 N/A N/A Y 15499 >>>>> Task Status of Volume public >>>>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> After restart of the service on one of the nodes (192.168.140.43) the >>>>> port seems >>>>> to have changed (but it didn't): >>>>> root@app3:/var/log/glusterfs# gluster volume status >>>>> Status of volume: public >>>>> Gluster process TCP Port RDMA Port Online Pid >>>>> ------------------------------------------------------------------------------ >>>>> Brick 192.168.140.41:/gluster/public 49153 0 Y 6364 >>>>> Brick 192.168.140.42:/gluster/public 49152 0 Y 1483 >>>>> Brick 192.168.140.43:/gluster/public 49154 0 Y 5913 >>>>> Self-heal Daemon on localhost N/A N/A Y 4628 >>>>> Self-heal Daemon on 192.168.140.42 N/A N/A Y 3077 >>>>> Self-heal Daemon on 192.168.140.41 N/A N/A Y 28777 >>>>> Task Status of Volume public >>>>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> However the active process is STILL the same pid AND still listening on >>>>> the old >>>>> port >>>>> [email protected]:/var/log/glusterfs# netstat -tapn | grep gluster >>>>> tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 5913/glusterfsd >>>>> The other nodes logs fill up with errors because they can't reach the >>>>> daemon >>>>> anymore. They try to reach it on the "new" port instead of the old one: >>>>> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >>>>> refused); disconnecting socket >>>>> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >>>>> refused); disconnecting socket >>>>> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >>>>> refused); disconnecting socket >>>>> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >>>>> refused); disconnecting socket >>>>> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >>>>> 0-public-client-2: changing port to 49154 (from 0) >>>>> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] >>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >>>>> refused); disconnecting socket >>>>> So they now try 49154 instead of the old 49152 >>>>> Is this also by design? We had a lot of issues because of this recently. >>>>> We >>>>> don't understand why it starts advertising a completely wrong port after >>>>> stop/start. >>>>> Regards >>>>> Jo Goossens >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> [email protected] >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
