It's hard to tell as I had to reboot the box in order to get it working
properly again.  I know that sendmail was not the first of the defunct
processes.  I do have triggers set, and I was receiving trigger email
while these defunct processes existed.  I do not have any gaps in the
graphs for any services, so jff was still running, though very slowly
due to these orphans.  I have enabled snmp monitoring of the jff server
to try and track the defunct processes if they reappear.  If they come
back I will add information if possible.

Brad

On Tue, 2005-09-13 at 19:23 -0300, Javier Szyszlican wrote:
> If Sendmail was involved you should check what was happening to it.
> 
> Because if you have triggers set, JFFNMS consolidate will be blocked until it 
> can send the email and that can cause all sorts of issues.
> 
> Javier
> 
> Brad Hudson wrote:
> > There appears to be an issue with jffnms leaving orphans behind.  My
> > system was running fine for 31 days, but this morning there were over
> > 34,000 orphaned.  These processes were mostly php or fping with a
> > smattering of id and sendmail.  These defunct processes are unkillable
> > and brought my system to a crawl.  I had to reboot to clear them.
> > 
> >>From what I understand the orphans are caused when a process forks a
> > child and then exits before the child finishes.  When this happens the
> > forked process' parent changes to init, which should normally clean them
> > up when they finish running.  When a process is defunct before init
> > inherits them, init is unable to clean them and they stay in the system
> > forever or until the system is rebooted.
> > 
> > The only reason I can think of for the child to be passed to init as
> > defunct is high load causing a delay between when the real parent exits
> > and the child finishes running with another delay between the parent
> > exiting and the child being passed to init.  The only solution I can see
> > to this is to ensure that all children have completed before allowing
> > the parent process to exit.  I do not know which specific module left
> > the orphans behind, but it must be one of the pollers as I am not
> > running auto discovery on any hosts.
> > 
> > Little problems like these are hardly enough to make me stop using this
> > great product Javier, you should be proud of it.  I just wanted to bring
> > it to your attention.
> > 
> > Regards;
> > 
> > Brad
> > 
> > 
> 
-- 
Brad Hudson
Systems Analyst
[EMAIL PROTECTED]
Telephone: (613) 694-2681
Fax: (613) 759-1651


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
jffnms-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jffnms-users

Reply via email to