Hi Steve, you scared me there a bit because I've just put to prod RHEL6.5 + GlusterFS 3.4.2
However, I cannot see any such problem. I have no zombie processes and executing the command in question, or any other, does not create zombies or other problems. Unfortunatelly I'm not sure what could be causing this.. v On Mon 24 Mar 2014 12:13:03, Steve Thomas wrote: > Some further information: > > When I run the command > "gluster volume status audio detail" > I get the Zombie process created.... So it's not the HERE document as I > previously thought... it's the command itself. > > Does this happen with anyone else? > > Thanks, > Steve > > > From: [email protected] > [mailto:[email protected]] On Behalf Of Steve Thomas > Sent: 24 March 2014 11:55 > To: Carlos Capriotti > Cc: [email protected] > Subject: Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5 > > Hi Carlos, > > Thanks for coming back to me... in response to your queries: > > PID is low, 1153 for glusterd with glusterfsd 1168 and 2 x glusterfs with > 1318 and 1319 so I'd agree... it doesn't seem that glusterd is crashing and > being restarted. > > As of today, Monday morning top is reporting 1398 glusterd zombie processes. > > I have this problem on all 4 of my gluster nodes and all four are being > monitored by the attached nagios plugin. > > In terms of testing, I've prevented nagios from running the attached check > script and restarted the glusterd process using > "service glusterd restart". I've let it run for a few hours and haven't yet > seen any zombie processes created. This I think is good as, for whatever > reason, it appears to point at the nagios check script being the problem. > > My next check was to run the nagios check once to see if it created a Zombie > process... it did.... So I started looking at the script. I forced the script > to exit after the first command "gluster volume heal audio info" and no > Zombie process was created. This pointed me to the second which takes this > form.... I'm no expert of HERE documents in shell but I think that it maybe > causing the issue: > while read -r line; do > field=($(echo $line)) > case ${field[0]} in > Brick) > brick=${field[@]:2} > ;; > Disk) > key=${field[@]:0:3} > if [ "${key}" = "Disk Space Free" ]; then > freeunit=${field[@]:4} > unit=${freeunit: -2} > free=${freeunit%$unit} > if [ "$unit" != "GB" ]; then > Exit UNKNOWN "Unknown disk space size $freeunit\n" > fi > if (( $(bc <<< "${free} < ${freegb}") == 1 )); then > freegb=$free > fi > fi > ;; > Online) > online=${field[@]:2} > if [ "${online}" = "Y" ]; then > let $((bricksfound++)) > else > errors=("${errors[@]}" "$brick offline") > fi > ;; > esac > done < <( sudo gluster volume status ${VOLUME} detail) > > > Anyone spot why this would be an issue? > > Thanks, > Steve > > > From: Carlos Capriotti [mailto:[email protected]] > Sent: 22 March 2014 11:51 > To: Steve Thomas > Cc: [email protected]<mailto:[email protected]> > Subject: Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5 > > ok, let's see if we can gather more info. > > I am not a specialist, but you know... another pair of eyes. > > My system has a single glusterd process and it has a pretty low PID, meaning > it has not crashed. > > What is your PID for your glusterd ? how many zombie processes are there > reported by top ? > > I've been running my preliminary tests with gluster for a little over a month > now and have never seen this. My platform is CentOS 6.5, so, I'd say it is > pretty similar. > > >From my perspective, even making gluster sweat, running some intense rsync > >jobs in parallel, and seeing glusterd AND glusterfs take 120% of processing > >time on top (each on one core), they never crashed. > > My zombie count, from top, is zero. > > On the other hand, I had one of my nodes, the other day, crashing a process > every time I started a high demanding task. Ends up I had (and still have) a > hardware problem on one of the processor (or the main board; still > undiagnosed). > > Do you have this problem on one node only ? > > Any chance you have something special compiled on your kernel ? > > Any particularly memory-hungry tweak on your sysctl ? > > Sounds like the system, not gluster. > > KR, > > Carlos > > > > On Fri, Mar 21, 2014 at 10:29 PM, Steve Thomas > <[email protected]<mailto:[email protected]>> > wrote: > Hi all... > > Further investigation shows in excess of 500 glusterd zombie processes and > continuing to climb on the box ... > > Any suggestions? Am happy to provide logs etc to get to the bottom of this.... > > _____________________________________________ > From: Steve Thomas > Sent: 21 March 2014 13:21 > To: '[email protected]<mailto:[email protected]>' > Subject: Gluster 3.4.2 on Redhat 6.5 > > > Hi, > > I'm running Gluster 3.4.2 on Redhat 6.5 with 4 servers with a brick on each. > This brick is mounted locally and used by apache to server audio files for an > IVR system. Each of these audio files are typically around 80-100Kb. > > System appears to be working ok in terms of health and status via gluster CLI. > > The system is monitored by nagios and there's a check for zombie processes > and the gluster status. It appears that over a 24 hour period the number of > Zombie processes on the box has increased and is continually increasing. > Investigating these are "glusterd" processes. > > I'm making an assumption but I'd suspect that the regular nagios checks are > resulting in the increase in zombie processes as they are querying the > glusterd process. The command that the nagios plugin is running is: > > #Check heal status > gluster volume heal audio info > > #Check volume status > gluster volume status audio detail > > Does anyone have any suggestions as to why glusterd is resulting in these > zombie processes? > > Thanks for help in advance, > > Steve > > > > _______________________________________________ > Gluster-users mailing list > [email protected]<mailto:[email protected]> > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265 _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
