Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5

Viktor Villafuerte Mon, 24 Mar 2014 15:27:28 -0700

Hi Steve,

you scared me there a bit because I've just put to prod RHEL6.5 +
GlusterFS 3.4.2


However, I cannot see any such problem. I have no zombie processes and
executing the command in question, or any other, does not create zombies
or other problems.

Unfortunatelly I'm not sure what could be causing this..

v


On Mon 24 Mar 2014 12:13:03, Steve Thomas wrote:
> Some further information:
> 
> When I run the command
> "gluster volume status audio detail"
> I get the Zombie process created.... So it's not the HERE document as I 
> previously thought... it's the command itself.
> 
> Does this happen with anyone else?
> 
> Thanks,
> Steve
> 
> 
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Steve Thomas
> Sent: 24 March 2014 11:55
> To: Carlos Capriotti
> Cc: [email protected]
> Subject: Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5
> 
> Hi Carlos,
> 
> Thanks for coming back to me... in response to your queries:
> 
> PID is low, 1153 for glusterd with glusterfsd 1168 and 2 x glusterfs with 
> 1318 and 1319 so I'd agree... it doesn't seem that glusterd is crashing and 
> being restarted.
> 
> As of today, Monday morning top is reporting 1398 glusterd zombie processes.
> 
> I have this problem on all 4 of my gluster nodes and all four are being 
> monitored by the attached nagios plugin.
> 
> In terms of testing, I've prevented nagios from running the attached check 
> script and restarted the glusterd process using
> "service glusterd restart". I've let it run for a few hours and haven't yet 
> seen any zombie processes created. This I think is good as, for whatever 
> reason, it appears to point at the nagios check script being the problem.
> 
> My next check was to run the nagios check once to see if it created a Zombie 
> process... it did.... So I started looking at the script. I forced the script 
> to exit after the first command "gluster volume heal audio info" and no 
> Zombie process was created. This pointed me to the second which takes this 
> form.... I'm no expert of HERE documents in shell but I think that it maybe 
> causing the issue:
> while read -r line; do
>      field=($(echo $line))
>      case ${field[0]} in
>      Brick)
>            brick=${field[@]:2}
>            ;;
>      Disk)
>            key=${field[@]:0:3}
>            if [ "${key}" = "Disk Space Free" ]; then
>                 freeunit=${field[@]:4}
>                 unit=${freeunit: -2}
>                 free=${freeunit%$unit}
>                 if [ "$unit" != "GB" ]; then
>                      Exit UNKNOWN "Unknown disk space size $freeunit\n"
>                 fi
>                 if (( $(bc <<< "${free} < ${freegb}") == 1 )); then
>                      freegb=$free
>                 fi
>            fi
>            ;;
>      Online)
>            online=${field[@]:2}
>            if [ "${online}" = "Y" ]; then
>                 let $((bricksfound++))
>            else
>                 errors=("${errors[@]}" "$brick offline")
>            fi
>            ;;
>      esac
> done < <( sudo gluster volume status ${VOLUME} detail)
> 
> 
> Anyone spot why this would be an issue?
> 
> Thanks,
> Steve
> 
> 
> From: Carlos Capriotti [mailto:[email protected]]
> Sent: 22 March 2014 11:51
> To: Steve Thomas
> Cc: [email protected]<mailto:[email protected]>
> Subject: Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5
> 
> ok, let's see if we can gather more info.
> 
> I am not a specialist, but you know... another pair of eyes.
> 
> My system has a single glusterd process and it has a pretty low PID, meaning 
> it has not crashed.
> 
> What is your PID for your glusterd ? how many zombie processes are there 
> reported by top ?
> 
> I've been running my preliminary tests with gluster for a little over a month 
> now and have never seen this. My platform is CentOS 6.5, so, I'd say it is 
> pretty similar.
> 
> >From my perspective, even making gluster sweat, running some intense rsync 
> >jobs in parallel, and seeing glusterd AND glusterfs take 120% of processing 
> >time on top (each on one core), they never crashed.
> 
> My zombie count, from top,  is zero.
> 
> On the other hand, I had one of my nodes, the other day, crashing a process 
> every time I started a high demanding task. Ends up I had (and still have) a 
> hardware problem on one of the processor (or the main board; still 
> undiagnosed).
> 
> Do you have this problem on one node only ?
> 
> Any chance you have something special compiled on your kernel ?
> 
> Any particularly memory-hungry tweak on your sysctl ?
> 
> Sounds like the system, not gluster.
> 
> KR,
> 
> Carlos
> 
> 
> 
> On Fri, Mar 21, 2014 at 10:29 PM, Steve Thomas 
> <[email protected]<mailto:[email protected]>>
>  wrote:
> Hi all...
> 
> Further investigation shows in excess of 500 glusterd zombie processes and 
> continuing to climb on the box ...
> 
> Any suggestions? Am happy to provide logs etc to get to the bottom of this....
> 
> _____________________________________________
> From: Steve Thomas
> Sent: 21 March 2014 13:21
> To: '[email protected]<mailto:[email protected]>'
> Subject: Gluster 3.4.2 on Redhat 6.5
> 
> 
> Hi,
> 
> I'm running Gluster 3.4.2 on Redhat 6.5 with 4 servers with a brick on each. 
> This brick is mounted locally and used by apache to server audio files for an 
> IVR system. Each of these audio files are typically around 80-100Kb.
> 
> System appears to be working ok in terms of health and status via gluster CLI.
> 
> The system is monitored by nagios and there's a check for zombie processes 
> and the gluster status. It appears that over a 24 hour period the number of 
> Zombie processes on the box has increased and is continually increasing. 
> Investigating these are "glusterd" processes.
> 
> I'm making an assumption but I'd suspect that the regular nagios checks are 
> resulting in the increase in zombie processes as they are querying the 
> glusterd process. The command that the nagios plugin is running is:
> 
> #Check heal status
> gluster volume heal audio info
> 
> #Check volume status
> gluster volume status audio detail
> 
> Does anyone have any suggestions as to why glusterd is resulting in these 
> zombie processes?
> 
> Thanks for help in advance,
> 
> Steve
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]<mailto:[email protected]>
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 

> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


-- 
Regards

Viktor Villafuerte
Optus Internet Engineering
t: 02 808-25265
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5

Reply via email to