I believe that the merge of PR 155 solved a problem, in the same way I
tried to solve the mail problem: Make sure that SIGCHLD is set to DEFAULT
and not to IGNORE.
But after researching a bit about daemonizing in Python, I see no
reason for the $SIG{CHLD} = 'IGNORE' statement to be a part of basic
daemonizing.
Hence, I believe that this statement might be the root cause of the
problem.
Without setting SIGCHLD to IGNORE, both the problem PR 155 solves, and the
mail problem I stumbled upon, will be gone.
This statement was added in this commit:
commit abb936a85f6bb09169329b8daf1d2b2c094e07f1
Fixes an issue where forked processes become defunct rather than
exiting cleanly.
I suggest to solve the problem of zombie processes with a double-fork
(Double-forking is a common technique in daemonizing, used to make sure the
daemon is completely detached from it's parent):
1. Parent forks first child
2. Parent does waitpid()
3. First child forks the daemon
4. First child exits with a return code indicating the success of the
second fork.
5. Parent gets this return code and knows that the second fork went OK.
6. Because the first child is dead, the parent of the daemon is set to 1
(init), so when the daemon exits, it does not become a zombie (init
reclaims it).
This way the SIGCHLD handler is not affected, in both parent and daemon.
Comments ?
Thanks, Oren.
On Tue, Feb 7, 2017 at 10:40 PM, Dave Neuman <[email protected]> wrote:
> Thanks Jeremy. We did merge that PR not too long ago to fix one issue, but
> it looks like it created another. I agree that it would be better to use a
> different library.
>
> On Tue, Feb 7, 2017 at 1:29 PM, Jeremy Mitchell <[email protected]>
> wrote:
>
> > I'll have to defer to someone else as most of that was way over my head
> but
> > I do know we have this ticket outstanding:
> > https://issues.apache.org/jira/browse/TC-74
> >
> > and also, this PR was merged a while back regarding $SIG{CHLD} -
> > https://github.com/apache/incubator-trafficcontrol/pull/155
> >
> > I'm not really sure if it's a good or bad idea to wrap every call to
> > $self->mail(...);
> > as you suggested. maybe others have thoughts on that.
> >
> > preferably, it would be nice to get TC-74 done and remove our dependency
> on
> > an outdated library.
> >
> > jeremy
> >
> >
> > On Tue, Feb 7, 2017 at 4:27 AM, Oren Shemesh <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I'm looking for someone to form an opinion about a problem I believe I
> > have
> > > found, and a fix.
> > >
> > > The problem manifests itself as an error when TO attempts to send an
> > > E-mail.
> > >
> > > The details:
> > > When traffic ops runs, it daemonizes itself, and as part of that, it
> does
> > > $SIG{CHLD} = 'IGNORE'; :
> > >
> > > https://github.com/apache/incubator-trafficcontrol/blob/
> > > master/traffic_ops/app/lib/MojoPlugins/Daemonize.pm#L28
> > >
> > > But, when sending a mail, TO uses Mojolicous::Plugin::Mail, which in
> turn
> > > is a wrapper for the standard (And Obsolete) perl module MIME::Lite.
> > >
> > > MIME::Lite sends a mail by running the following piece of code (Snippet
> > > from app/local/lib/perl5/MIME/Lite.pm):
> > >
> > > my $pid = open SENDMAIL, "|-";
> > > defined($pid) or die "open of pipe failed: $!\n";
> > > if ( !$pid ) { ### child
> > > exec(@cmd) or die "can't exec $p{Sendmail}: $!\n";
> > > ### NOTREACHED
> > > } else { ### parent
> > > $self->print( \*SENDMAIL );
> > > close SENDMAIL || die "error closing $p{Sendmail}: $! (exit
> > > $?)\n";
> > > $return = 1;
> > > }
> > >
> > > The problem with this code is that assumes that the SIGCHLD handler is
> > left
> > > at it's DEFAULT handling, which means that when a child process
> > terminates,
> > > the parent process can reap it during the 'close' statement.
> > > Traffic Ops runs with the SIGCHLD handler set to IGNORE, which means
> that
> > > when a child process dies, the parent process forgets about it
> > immediately.
> > > So, if sendmail runs fast enough and finishes before the 'close'
> command
> > is
> > > executed, the close will fail to find the child process, and the error
> > > propagates all the way to the calling code, looking something like
> this:
> > >
> > > Traffic Ops fatal error occurred while processing your request.
> > > Error at line 2732 ( close SENDMAIL || die "error closing $p{Sendmail}:
> > $!
> > > (exit $?)\n";)
> > > error closing /usr/lib/sendmail: No child processes (exit -1)
> > >
> > > It should be noted that we are running TO on single-core machines for
> > > development, which means that the code is much more likely to fail in
> > this
> > > manner (In machines with more cores, the parent process usually does
> not
> > > get suspended when sendmail is spawned, hence it reaches the 'close'
> > > command immediately and waits for sendmail to finish properly).
> > >
> > > I am not sure why TO needs to completely ignore SIGCHLD, and I am not
> > sure
> > > if it is reasonable to ask from a module such as MIME::Lite to work
> when
> > > SIGCHLD is set to IGNORE.
> > > However, MIME::Lite is not being maintained any more (The owner
> > recommends
> > > using better modules, see this
> > > <http://search.cpan.org/~rjbs/MIME-Lite-3.030/lib/MIME/Lite.pm#WAIT
> !>),
> > > and
> > > the latest Mojolicous::Plugin::Mail version still uses MIME::Lite.
> > > (I guess Mojolicous is not very actively maintained these days).
> > > Which means that until we switch to the experimental, Angular-based,
> > > Traffic Ops, we are stuck with the current MIME::Lite code.
> > >
> > > Maybe a good fix for this issue is to not set SIGCHLD to IGNORE while
> > > daemonizing. However, this means that every time we create a new
> process,
> > > we must waitpid() for it, otherwise we will accumulate zombies.
> > > I guess it is not very practical to ensure this across the entire code
> > base
> > > of TO (And, as we have seen, all Perl modules that it uses).
> > >
> > > So the low-footprint fix I want to make is to wrap every call to
> > > $self->mail(...) done in our Email.pm
> > > <https://github.com/apache/incubator-trafficcontrol/blob/
> > > master/traffic_ops/app/lib/MojoPlugins/Email.pm>
> > > code, like this:
> > >
> > > $SIG{CHLD} = 'DEFAULT';
> > > $self->mail(...);
> > > $SIG{CHLD} = 'IGNORE';
> > >
> > > I tested this, and it fixes the problem.
> > >
> > > Thoughts ?
> > >
> > > I plan to create a JIRA and a PR, but I wanted to get someone's more
> > expert
> > > view on this before I go on.
> > > Being a bit of a novice, maybe I am missing some key thing here.
> > >
> > > Thanks ,Oren.
> > > --
> > >
> > > *Oren Shemesh*
> > > Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 |
> [email protected]
> > > <[email protected]>
> > >
> >
>
--
*Oren Shemesh*
Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected]
<[email protected]>