Sure, will do.

On Feb 8, 2017 3:53 PM, "Dave Neuman" <[email protected]> wrote:

> Can you create an issue and submit a PR so we can take a look at it?
> Thanks,
> Dave
>
> On Tue, Feb 7, 2017 at 10:03 PM, Oren Shemesh <[email protected]> wrote:
>
> > Yes, that is what I am suggesting.
> >
> > I believe it will make the code in PR 155 redundant, as well.
> >
> >
> > On Feb 8, 2017 00:50, "Jeremy Mitchell" <[email protected]> wrote:
> >
> > So you're suggesting a different solution that this?
> >
> > https://github.com/apache/incubator-trafficcontrol/commit/
> > abb936a85f6bb09169329b8daf1d2b2c094e07f1
> >
> > you're suggesting that the code in that commit be reverted and a
> > double-fork be pursued instead?
> >
> > jeremy
> >
> > On Tue, Feb 7, 2017 at 3:33 PM, Oren Shemesh <[email protected]> wrote:
> >
> > > I  believe that the merge of PR 155 solved a problem, in the same way I
> > > tried to solve the mail problem: Make sure that SIGCHLD is set to
> DEFAULT
> > > and not to IGNORE.
> > >
> > > But after researching a bit about daemonizing in Python, I see no
> > > reason for the $SIG{CHLD} = 'IGNORE' statement to be a part of basic
> > > daemonizing.
> > > Hence, I believe that this statement might be the root cause of the
> > > problem.
> > > Without setting SIGCHLD to IGNORE, both the problem PR 155 solves, and
> > the
> > > mail problem I stumbled upon, will be gone.
> > >
> > > This statement was added in this commit:
> > >
> > > commit abb936a85f6bb09169329b8daf1d2b2c094e07f1
> > >
> > >     Fixes an issue where forked processes become defunct rather than
> > > exiting cleanly.
> > >
> > > I suggest to solve the problem of zombie processes with a double-fork
> > > (Double-forking is a common technique in daemonizing, used to make sure
> > the
> > > daemon is completely detached from it's parent):
> > >
> > > 1. Parent forks first child
> > > 2. Parent does waitpid()
> > > 3. First child forks the daemon
> > > 4. First child exits with a return code indicating the success of the
> > > second fork.
> > > 5. Parent gets this return code and knows that the second fork went OK.
> > > 6. Because the first child is dead, the parent of the daemon is set to
> 1
> > > (init), so when the daemon exits, it does not become a zombie (init
> > > reclaims it).
> > >
> > > This way the SIGCHLD handler is not affected, in both parent and
> daemon.
> > >
> > > Comments ?
> > >
> > > Thanks, Oren.
> > >
> > >
> > >
> > > On Tue, Feb 7, 2017 at 10:40 PM, Dave Neuman <[email protected]>
> wrote:
> > >
> > > > Thanks Jeremy.  We did merge that PR not too long ago to fix one
> issue,
> > > but
> > > > it looks like it created another.  I agree that it would be better to
> > > use a
> > > > different library.
> > > >
> > > > On Tue, Feb 7, 2017 at 1:29 PM, Jeremy Mitchell <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > I'll have to defer to someone else as most of that was way over my
> > head
> > > > but
> > > > > I do know we have this ticket outstanding:
> > > > > https://issues.apache.org/jira/browse/TC-74
> > > > >
> > > > > and also, this PR was merged a while back regarding $SIG{CHLD} -
> > > > > https://github.com/apache/incubator-trafficcontrol/pull/155
> > > > >
> > > > > I'm not really sure if it's a good or bad idea to wrap every call
> to
> > > > > $self->mail(...);
> > > > > as you suggested. maybe others have thoughts on that.
> > > > >
> > > > > preferably, it would be nice to get TC-74 done and remove our
> > > dependency
> > > > on
> > > > > an outdated library.
> > > > >
> > > > > jeremy
> > > > >
> > > > >
> > > > > On Tue, Feb 7, 2017 at 4:27 AM, Oren Shemesh <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm looking for someone to form an opinion about a problem I
> > believe
> > > I
> > > > > have
> > > > > > found, and a fix.
> > > > > >
> > > > > > The problem manifests itself as an error when TO attempts to send
> > an
> > > > > > E-mail.
> > > > > >
> > > > > > The details:
> > > > > > When traffic ops runs, it daemonizes itself, and as part of that,
> > it
> > > > does
> > > > > > $SIG{CHLD} = 'IGNORE'; :
> > > > > >
> > > > > > https://github.com/apache/incubator-trafficcontrol/blob/
> > > > > > master/traffic_ops/app/lib/MojoPlugins/Daemonize.pm#L28
> > > > > >
> > > > > > But, when sending a mail, TO uses Mojolicous::Plugin::Mail, which
> > in
> > > > turn
> > > > > > is a wrapper for the standard (And Obsolete) perl module
> > MIME::Lite.
> > > > > >
> > > > > > MIME::Lite sends a mail by running the following piece of code
> > > (Snippet
> > > > > > from app/local/lib/perl5/MIME/Lite.pm):
> > > > > >
> > > > > >        my $pid = open SENDMAIL, "|-";
> > > > > >         defined($pid) or die "open of pipe failed: $!\n";
> > > > > >         if ( !$pid ) {    ### child
> > > > > >             exec(@cmd) or die "can't exec $p{Sendmail}: $!\n";
> > > > > >             ### NOTREACHED
> > > > > >         } else {          ### parent
> > > > > >             $self->print( \*SENDMAIL );
> > > > > >             close SENDMAIL || die "error closing $p{Sendmail}: $!
> > > (exit
> > > > > > $?)\n";
> > > > > >             $return = 1;
> > > > > >         }
> > > > > >
> > > > > > The problem with this code is that assumes that the SIGCHLD
> handler
> > > is
> > > > > left
> > > > > > at it's DEFAULT handling, which means that when a child process
> > > > > terminates,
> > > > > > the parent process can reap it during the 'close' statement.
> > > > > > Traffic Ops runs with the SIGCHLD handler set to IGNORE, which
> > means
> > > > that
> > > > > > when a child process dies, the parent process forgets about it
> > > > > immediately.
> > > > > > So, if sendmail runs fast enough and finishes before the 'close'
> > > > command
> > > > > is
> > > > > > executed, the close will fail to find the child process, and the
> > > error
> > > > > > propagates all the way to the calling code, looking something
> like
> > > > this:
> > > > > >
> > > > > > Traffic Ops fatal error occurred while processing your request.
> > > > > > Error at line 2732 ( close SENDMAIL || die "error closing
> > > $p{Sendmail}:
> > > > > $!
> > > > > > (exit $?)\n";)
> > > > > > error closing /usr/lib/sendmail: No child processes (exit -1)
> > > > > >
> > > > > > It should be noted that we are running TO on single-core machines
> > for
> > > > > > development, which means that the code is much more likely to
> fail
> > in
> > > > > this
> > > > > > manner (In machines with more cores, the parent process usually
> > does
> > > > not
> > > > > > get suspended when sendmail is spawned, hence it reaches the
> > 'close'
> > > > > > command immediately and waits for sendmail to finish properly).
> > > > > >
> > > > > > I am not sure why TO needs to completely ignore SIGCHLD, and I am
> > not
> > > > > sure
> > > > > > if it is reasonable to ask from a module such as MIME::Lite to
> work
> > > > when
> > > > > > SIGCHLD is set to IGNORE.
> > > > > > However, MIME::Lite is not being maintained any more (The owner
> > > > > recommends
> > > > > > using better modules, see this
> > > > > > <http://search.cpan.org/~rjbs/MIME-Lite-3.030/lib/MIME/Lite.
> > pm#WAIT
> > > > !>),
> > > > > > and
> > > > > > the latest Mojolicous::Plugin::Mail version still uses
> MIME::Lite.
> > > > > > (I guess Mojolicous is not very actively maintained these days).
> > > > > > Which means that until we switch to the experimental,
> > Angular-based,
> > > > > > Traffic Ops, we are stuck with the current MIME::Lite code.
> > > > > >
> > > > > > Maybe a good fix for this issue is to not set SIGCHLD to IGNORE
> > while
> > > > > > daemonizing. However, this means that every time we create a new
> > > > process,
> > > > > > we must waitpid() for it, otherwise we will accumulate zombies.
> > > > > > I guess it is not very practical to ensure this across the entire
> > > code
> > > > > base
> > > > > > of TO (And, as we have seen, all Perl modules that it uses).
> > > > > >
> > > > > > So the low-footprint fix I want to make is to wrap every call to
> > > > > > $self->mail(...) done in our Email.pm
> > > > > > <https://github.com/apache/incubator-trafficcontrol/blob/
> > > > > > master/traffic_ops/app/lib/MojoPlugins/Email.pm>
> > > > > > code, like this:
> > > > > >
> > > > > > $SIG{CHLD} = 'DEFAULT';
> > > > > > $self->mail(...);
> > > > > > $SIG{CHLD} = 'IGNORE';
> > > > > >
> > > > > > I tested this, and it fixes the problem.
> > > > > >
> > > > > > Thoughts ?
> > > > > >
> > > > > > I plan to create a JIRA and a PR, but I wanted to get someone's
> > more
> > > > > expert
> > > > > > view on this before I go on.
> > > > > > Being a bit of a novice, maybe I am missing some key thing here.
> > > > > >
> > > > > > Thanks ,Oren.
> > > > > > --
> > > > > >
> > > > > > *Oren Shemesh*
> > > > > > Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 |
> > > > [email protected]
> > > > > > <[email protected]>
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > *Oren Shemesh*
> > > Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 |
> [email protected]
> > > <[email protected]>
> > >
> >
>

Reply via email to