Hi again,

 sorry! from the code below, there's a mistake as DIE signal is linked to 
_sigterm, while is really pointing to _sigdie; just to clarify it before 
someone says "it can't work, you are pointing to the wrong method!" :D
 
 Alberto

 
---- Activado lun, 24 mar 2014 16:44:36 +0100 
albertocurro<albertocu...@zoho.com> escribió ---- 

 > Hi Rocco, 
 >  
 >  many thanks for your quick answer! Unfortunately, the provided solution 
 > only works partially. I still have some cases where the "fork bomb" message 
 > is here with us :( 
 >  
 >   One of the cases is this one: under some configuration, an instance of 
 > nginx is started, so our product writes the configuration file and starts 
 > the Nginx instance pointing to that configuration file. BUT, if the 
 > configuration file could not be written (directory does not exist, etc), 
 > then the error raises, and I've not found any way to handle it: 
 >  
 > DEBUG - Created nginx temporary directory /opt/tmp/pull/instance1 
 > DEBUG - Created nginx configuration directory /opt/etc/pull/instance1 
 > DEBUG - Created nginx log directory /opt/log/pull/instance1 
 > DEBUG - creating nginx configfile for instance 1 in /opt/etc/pull/instance1 
 > === 13991 === !!! Kernel has 1 child process(es). 
 > === 13991 === !!! At least one child process is still running when 
 > POE::Kernel->run() is ready to return. 
 > === 13991 === !!! Be sure to use sig_child() to reap child processes. 
 > === 13991 === !!! In extreme cases, failure to reap child processes has 
 > === 13991 === !!! resulted in a slow 'fork bomb' that has halted systems. 
 > Could not open file: No such file or directory 
 >  
 >  I've added a DIE handler in the main session to try to handle this: 
 >  
 >  $sig_session = POE::Session->create( 
 >     inline_states => { 
 >         _start => sub { 
 >             $_[HEAP]{RELOADED} = 0; 
 >             $_[KERNEL]->sig(TERM => '_sigterm'); 
 >             $_[KERNEL]->sig(INT => '_sigterm'); 
 >             $_[KERNEL]->sig(DIE => '_sigterm'); 
 >             $_[KERNEL]->sig(nginx_reload => '_sig_nginx_reload'); 
 >             $_[KERNEL]->alias_set('sighandler'); 
 >         }, 
 >         _sigdie => sub { 
 >             print "Handling exception, calling stop"; 
 >             POE::Kernel->call($sig_session, '_stop'); 
 >         }, 
 >         _stop => sub { 
 >             # Reap any existing pid (# 1825119) 
 >             print "Handling stop"; 
 >             POE::Kernel->sig_child(); 
 >             use POSIX ":sys_wait_h"; 
 >             1 while waitpid(WNOHANG, -1) > 0; 
 >  
 >             # Clear signal handlers... 
 >             $_[KERNEL]->sig('TERM'); 
 >  
 > But, as said above, it's not working. Checking POE's code, I can see the 
 > message lines are generated in Resources/Signals.pm, under 
 > _data_sig_finalize() method (where POE is already doing the same you 
 > recommended me, waiting for the pid). 
 >  
 > But _data_sig_finalize() method is called in Kernel.pm just after 
 > unregistered all the signals (Kernel.pm => _finalize_kernel): 
 >  
 >  my $self = shift; 
 >  
 >   # Disable signal watching since there's now no place for them to go. 
 >   foreach ($self->_data_sig_get_safe_signals()) { 
 >     $self->loop_ignore_signal($_); 
 >   } 
 >  
 >   # Remove the kernel session's signal watcher. 
 >   $self->_data_sig_remove($self->ID, "IDLE"); 
 >  
 >   # The main loop is done, no matter which event library ran it. 
 >   # sig before loop so that it clears the signal_pipe file handler 
 >   $self->_data_sig_finalize(); 
 >   $self->loop_finalize(); 
 >  
 >  Once here, none of my signal handlers in the main session instance would 
 > work, as the signals have been unregistered. On an exception (die) while 
 > POE::Kernel->run(), how could I handle it then?? 
 >  
 >  Thanks a lot 
 >  Alberto 
 >  
 >  
 >  
 >  
 > ---- Activado lun, 24 mar 2014 13:45:45 +0100 Rocco Caputo  escribió ----  
 >  
 > >Hi, Alberto.  
 > >  
 > >At program end time, POE runs a quick waitpid() check for child processes 
 > >that may have leaked. This check was added after a bug report where POE 
 > >locked up a server after several days of running. It turned out to be the 
 > >reporter's application, but it was hard to debug.  
 > >  
 > >Your program seems to have created two processes that it didn't reap: PIDs 
 > >5373 and 5374. The ideal solution is to reap those processes before 
 > >exiting. Your program can do this using POE::Kernel's sig_child() method.  
 > >  
 > >In some cases, a third-party library will create processes and not properly 
 > >clean them up. It can be impossible to solve this case without modifying 
 > >other people's code.  
 > >  
 > >If you just want to ignore the problem, this might do the trick. Put these 
 > >lines in your last _stop handler. They should reap the processes you've 
 > >leaked before POE's check:  
 > >  
 > >use POSIX ":sys_wait_h";  
 > >1 while waitpid(WNOHANG, -1) > 0;  
 > >  
 > >It's a bit of a pain, but I think it's better to explicitly ignore the 
 > >problem than for it to go unnoticed by default.  
 > >  
 > >Please let me know whether that resolves your problem. It may not. For 
 > >example, the processes may still be open until an object is destroyed at 
 > >global destruction time.  
 > >  
 > >--  
 > >Rocco Caputo   
 > >  
 > >On Mar 24, 2014, at 05:46, albertocurro  wrote:  
 > >  
 > >> Guys,  
 > >>  
 > >> We have a product developed using POE as a base framework, with some 
 > >> other tool libraries as log4perl; basically is a forward proxy, composed 
 > >> of several modules, each one of them comprising a POE::Session; all of 
 > >> them share an internal queue of tasks to be performed. Each module 
 > >> performs several tasks on initialization, and if anything goes wrong, 
 > >> croak() is called to stop the service -> this is considered ok, since 
 > >> croak() is only called during initialization, when validation is being 
 > >> performed.  
 > >>  
 > >> The product is stable and works really fine, but recently I updated POE 
 > >> to the latest version, and since then we can see this message in the 
 > >> logs:  
 > >>  
 > >> registering pdu failed: 263!  
 > >> === 5267 === 5 -> on_handle (from Handler/StoreRemote.pm at 87)  
 > >> === 5267 === 5 -> on_retry (from Handler/StoreRemote.pm at 141)  
 > >> === 5267 === 9 -> on_handle (from Handler/StoreRemote.pm at 87)  
 > >> === 5267 === 9 -> on_retry (from Handler/StoreRemote.pm at 141)  
 > >> === 5267 === !!! Kernel has child processes.  
 > >> === 5267 === !!! Stopped child process (PID 5373) reaped when 
 > >> POE::Kernel->run() is ready to return.  
 > >> === 5267 === !!! Stopped child process (PID 5374) reaped when 
 > >> POE::Kernel->run() is ready to return.  
 > >> === 5267 === !!! At least one child process is still running when 
 > >> POE::Kernel->run() is ready to return.  
 > >> === 5267 === !!! Be sure to use sig_child() to reap child processes.  
 > >> === 5267 === !!! In extreme cases, failure to reap child processes has  
 > >> === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems.  
 > >> mkdir /mnt/nfs99: Permission denied at Handler/Store.pm line 147  
 > >>  
 > >> first lines and last line above are the errors itself, but this part is 
 > >> new since the upgrading:  
 > >>  
 > >> === 5267 === !!! Kernel has child processes.  
 > >> === 5267 === !!! Stopped child process (PID 5373) reaped when 
 > >> POE::Kernel->run() is ready to return.  
 > >> === 5267 === !!! Stopped child process (PID 5374) reaped when 
 > >> POE::Kernel->run() is ready to return.  
 > >> === 5267 === !!! At least one child process is still running when 
 > >> POE::Kernel->run() is ready to return.  
 > >> === 5267 === !!! Be sure to use sig_child() to reap child processes.  
 > >> === 5267 === !!! In extreme cases, failure to reap child processes has  
 > >> === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems.  
 > >>  
 > >> I can see it everytime the service is stopped because of an unhandled 
 > >> condition, even when POE's event loop has been already running for ours. 
 > >> It was not visible before, and I can't get rid of it in any way. I've 
 > >> tried different ways to avoid it with no luck.  
 > >>  
 > >> Any advice or alternative approach on this?  
 > >>  
 > >> Many thanks  
 > >> Alberto  
 > >>  
 > >  
 > > 
 >  
 > 

Reply via email to