Finally - Created TC-158 <https://issues.apache.org/jira/browse/TC-158> and submitted PR #295 <https://github.com/apache/incubator-trafficcontrol/pull/295> against it :-)
On Fri, Feb 17, 2017 at 8:39 PM, Jeremy Mitchell <[email protected]> wrote: > @oren - did you ever submit a PR for this? I don't recall seeing it. > > Thanks, Jeremy > > On Wed, Feb 8, 2017 at 10:39 AM, Oren Shemesh <[email protected]> wrote: > > > Sure, will do. > > > > On Feb 8, 2017 3:53 PM, "Dave Neuman" <[email protected]> wrote: > > > > > Can you create an issue and submit a PR so we can take a look at it? > > > Thanks, > > > Dave > > > > > > On Tue, Feb 7, 2017 at 10:03 PM, Oren Shemesh <[email protected]> wrote: > > > > > > > Yes, that is what I am suggesting. > > > > > > > > I believe it will make the code in PR 155 redundant, as well. > > > > > > > > > > > > On Feb 8, 2017 00:50, "Jeremy Mitchell" <[email protected]> > wrote: > > > > > > > > So you're suggesting a different solution that this? > > > > > > > > https://github.com/apache/incubator-trafficcontrol/commit/ > > > > abb936a85f6bb09169329b8daf1d2b2c094e07f1 > > > > > > > > you're suggesting that the code in that commit be reverted and a > > > > double-fork be pursued instead? > > > > > > > > jeremy > > > > > > > > On Tue, Feb 7, 2017 at 3:33 PM, Oren Shemesh <[email protected]> > wrote: > > > > > > > > > I believe that the merge of PR 155 solved a problem, in the same > > way I > > > > > tried to solve the mail problem: Make sure that SIGCHLD is set to > > > DEFAULT > > > > > and not to IGNORE. > > > > > > > > > > But after researching a bit about daemonizing in Python, I see no > > > > > reason for the $SIG{CHLD} = 'IGNORE' statement to be a part of > basic > > > > > daemonizing. > > > > > Hence, I believe that this statement might be the root cause of the > > > > > problem. > > > > > Without setting SIGCHLD to IGNORE, both the problem PR 155 solves, > > and > > > > the > > > > > mail problem I stumbled upon, will be gone. > > > > > > > > > > This statement was added in this commit: > > > > > > > > > > commit abb936a85f6bb09169329b8daf1d2b2c094e07f1 > > > > > > > > > > Fixes an issue where forked processes become defunct rather > than > > > > > exiting cleanly. > > > > > > > > > > I suggest to solve the problem of zombie processes with a > double-fork > > > > > (Double-forking is a common technique in daemonizing, used to make > > sure > > > > the > > > > > daemon is completely detached from it's parent): > > > > > > > > > > 1. Parent forks first child > > > > > 2. Parent does waitpid() > > > > > 3. First child forks the daemon > > > > > 4. First child exits with a return code indicating the success of > the > > > > > second fork. > > > > > 5. Parent gets this return code and knows that the second fork went > > OK. > > > > > 6. Because the first child is dead, the parent of the daemon is set > > to > > > 1 > > > > > (init), so when the daemon exits, it does not become a zombie (init > > > > > reclaims it). > > > > > > > > > > This way the SIGCHLD handler is not affected, in both parent and > > > daemon. > > > > > > > > > > Comments ? > > > > > > > > > > Thanks, Oren. > > > > > > > > > > > > > > > > > > > > On Tue, Feb 7, 2017 at 10:40 PM, Dave Neuman <[email protected]> > > > wrote: > > > > > > > > > > > Thanks Jeremy. We did merge that PR not too long ago to fix one > > > issue, > > > > > but > > > > > > it looks like it created another. I agree that it would be > better > > to > > > > > use a > > > > > > different library. > > > > > > > > > > > > On Tue, Feb 7, 2017 at 1:29 PM, Jeremy Mitchell < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > I'll have to defer to someone else as most of that was way over > > my > > > > head > > > > > > but > > > > > > > I do know we have this ticket outstanding: > > > > > > > https://issues.apache.org/jira/browse/TC-74 > > > > > > > > > > > > > > and also, this PR was merged a while back regarding $SIG{CHLD} > - > > > > > > > https://github.com/apache/incubator-trafficcontrol/pull/155 > > > > > > > > > > > > > > I'm not really sure if it's a good or bad idea to wrap every > call > > > to > > > > > > > $self->mail(...); > > > > > > > as you suggested. maybe others have thoughts on that. > > > > > > > > > > > > > > preferably, it would be nice to get TC-74 done and remove our > > > > > dependency > > > > > > on > > > > > > > an outdated library. > > > > > > > > > > > > > > jeremy > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 7, 2017 at 4:27 AM, Oren Shemesh <[email protected]> > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I'm looking for someone to form an opinion about a problem I > > > > believe > > > > > I > > > > > > > have > > > > > > > > found, and a fix. > > > > > > > > > > > > > > > > The problem manifests itself as an error when TO attempts to > > send > > > > an > > > > > > > > E-mail. > > > > > > > > > > > > > > > > The details: > > > > > > > > When traffic ops runs, it daemonizes itself, and as part of > > that, > > > > it > > > > > > does > > > > > > > > $SIG{CHLD} = 'IGNORE'; : > > > > > > > > > > > > > > > > https://github.com/apache/incubator-trafficcontrol/blob/ > > > > > > > > master/traffic_ops/app/lib/MojoPlugins/Daemonize.pm#L28 > > > > > > > > > > > > > > > > But, when sending a mail, TO uses Mojolicous::Plugin::Mail, > > which > > > > in > > > > > > turn > > > > > > > > is a wrapper for the standard (And Obsolete) perl module > > > > MIME::Lite. > > > > > > > > > > > > > > > > MIME::Lite sends a mail by running the following piece of > code > > > > > (Snippet > > > > > > > > from app/local/lib/perl5/MIME/Lite.pm): > > > > > > > > > > > > > > > > my $pid = open SENDMAIL, "|-"; > > > > > > > > defined($pid) or die "open of pipe failed: $!\n"; > > > > > > > > if ( !$pid ) { ### child > > > > > > > > exec(@cmd) or die "can't exec $p{Sendmail}: > $!\n"; > > > > > > > > ### NOTREACHED > > > > > > > > } else { ### parent > > > > > > > > $self->print( \*SENDMAIL ); > > > > > > > > close SENDMAIL || die "error closing > $p{Sendmail}: > > $! > > > > > (exit > > > > > > > > $?)\n"; > > > > > > > > $return = 1; > > > > > > > > } > > > > > > > > > > > > > > > > The problem with this code is that assumes that the SIGCHLD > > > handler > > > > > is > > > > > > > left > > > > > > > > at it's DEFAULT handling, which means that when a child > process > > > > > > > terminates, > > > > > > > > the parent process can reap it during the 'close' statement. > > > > > > > > Traffic Ops runs with the SIGCHLD handler set to IGNORE, > which > > > > means > > > > > > that > > > > > > > > when a child process dies, the parent process forgets about > it > > > > > > > immediately. > > > > > > > > So, if sendmail runs fast enough and finishes before the > > 'close' > > > > > > command > > > > > > > is > > > > > > > > executed, the close will fail to find the child process, and > > the > > > > > error > > > > > > > > propagates all the way to the calling code, looking something > > > like > > > > > > this: > > > > > > > > > > > > > > > > Traffic Ops fatal error occurred while processing your > request. > > > > > > > > Error at line 2732 ( close SENDMAIL || die "error closing > > > > > $p{Sendmail}: > > > > > > > $! > > > > > > > > (exit $?)\n";) > > > > > > > > error closing /usr/lib/sendmail: No child processes (exit -1) > > > > > > > > > > > > > > > > It should be noted that we are running TO on single-core > > machines > > > > for > > > > > > > > development, which means that the code is much more likely to > > > fail > > > > in > > > > > > > this > > > > > > > > manner (In machines with more cores, the parent process > usually > > > > does > > > > > > not > > > > > > > > get suspended when sendmail is spawned, hence it reaches the > > > > 'close' > > > > > > > > command immediately and waits for sendmail to finish > properly). > > > > > > > > > > > > > > > > I am not sure why TO needs to completely ignore SIGCHLD, and > I > > am > > > > not > > > > > > > sure > > > > > > > > if it is reasonable to ask from a module such as MIME::Lite > to > > > work > > > > > > when > > > > > > > > SIGCHLD is set to IGNORE. > > > > > > > > However, MIME::Lite is not being maintained any more (The > owner > > > > > > > recommends > > > > > > > > using better modules, see this > > > > > > > > <http://search.cpan.org/~rjbs/MIME-Lite-3.030/lib/MIME/Lite. > > > > pm#WAIT > > > > > > !>), > > > > > > > > and > > > > > > > > the latest Mojolicous::Plugin::Mail version still uses > > > MIME::Lite. > > > > > > > > (I guess Mojolicous is not very actively maintained these > > days). > > > > > > > > Which means that until we switch to the experimental, > > > > Angular-based, > > > > > > > > Traffic Ops, we are stuck with the current MIME::Lite code. > > > > > > > > > > > > > > > > Maybe a good fix for this issue is to not set SIGCHLD to > IGNORE > > > > while > > > > > > > > daemonizing. However, this means that every time we create a > > new > > > > > > process, > > > > > > > > we must waitpid() for it, otherwise we will accumulate > zombies. > > > > > > > > I guess it is not very practical to ensure this across the > > entire > > > > > code > > > > > > > base > > > > > > > > of TO (And, as we have seen, all Perl modules that it uses). > > > > > > > > > > > > > > > > So the low-footprint fix I want to make is to wrap every call > > to > > > > > > > > $self->mail(...) done in our Email.pm > > > > > > > > <https://github.com/apache/incubator-trafficcontrol/blob/ > > > > > > > > master/traffic_ops/app/lib/MojoPlugins/Email.pm> > > > > > > > > code, like this: > > > > > > > > > > > > > > > > $SIG{CHLD} = 'DEFAULT'; > > > > > > > > $self->mail(...); > > > > > > > > $SIG{CHLD} = 'IGNORE'; > > > > > > > > > > > > > > > > I tested this, and it fixes the problem. > > > > > > > > > > > > > > > > Thoughts ? > > > > > > > > > > > > > > > > I plan to create a JIRA and a PR, but I wanted to get > someone's > > > > more > > > > > > > expert > > > > > > > > view on this before I go on. > > > > > > > > Being a bit of a novice, maybe I am missing some key thing > > here. > > > > > > > > > > > > > > > > Thanks ,Oren. > > > > > > > > -- > > > > > > > > > > > > > > > > *Oren Shemesh* > > > > > > > > Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | > > > > > > [email protected] > > > > > > > > <[email protected]> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > *Oren Shemesh* > > > > > Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | > > > [email protected] > > > > > <[email protected]> > > > > > > > > > > > > > > > -- *Oren Shemesh* Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected] <[email protected]>
