Going a bit further with the topics of my previous post. ;-)

As hinted in the log excerpt, it could well be that Dovecot invokes 
getpwuid(3), directly or indirectly:

>       dovecot[97622] <Info>: master: Dovecot v2.1.14 starting up (core dumps 
> disabled)
>       [...]
>       dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : Using 
> permissions from /_Data/Mailstores/100016/mboxes: mode=0700 gid=-1
>       com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) <Error>: 
> getpwuid("100016") failed
>       com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) 
> <Notice>: Job failed to exec(3). Setting up event to tell us when to try 
> again: 3: No such process
>       com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) 
> <Notice>: Job failed to exec(3) for weird reason: 3

In this precise case (opening a first time pop connection), this seems to be an 
indirect call to  getpwuid(3). Anyway, keeping pseudomaster.plist and 
pseudomaster.c unchanged, I quickly modified pseudochild.c so that it now calls 
getpwuid(3).

When loading pseudomaster.plist for the first time, this is what get written in 
the logs:

        ALMba.local pseudomaster[66978] <Notice>: Master: started.
        ALMba.local pseudomaster[66980] <Notice>: Child: forked.
        ALMba.local pseudochild[66980] <Notice>: Pseudochild: started.
        ALMba com.apple.launchd[1] (com.apple.launchd.peruser.100018[66981]) 
<Error>: getpwuid("100018") failed
        ALMba com.apple.launchd[1] (com.apple.launchd.peruser.100018[66981]) 
<Notice>: Job failed to exec(3). Setting up event to tell us when to try again: 
3: No such process
        ALMba com.apple.launchd[1] (com.apple.launchd.peruser.100018[66981]) 
<Notice>: Job failed to exec(3) for weird reason: 3

and process pseudochild is just hanging.

Note the process number 66981, which is not pseudochild's one (66980), as if 
there were an attempt to spawn a subprocess. And this is a very elusive one; 
for example, no way to catch it with execsnoop or similar tools. Even launchd 
appears somewhat lost.

Subsequent unloads/reloads of pseudomaster.plist always end with a hanging 
pseudochild process, yet without those com.apple.launchd.peruser messages 
anymore.
In fact, in order to get those messages back, one has to remove the job bearing 
label "com.apple.launchd.peruser.100018".

And this is without mentioning the directories 
/var/log/com.apple.launchd.peruser.100018 and 
/var/db/launchd/com.apple.launchd.peruser.100018 created under such 
circumstances; moreover, those directories seem to persist across reboots...

So, all of this looks quite similar to the problems encountered with Dovecot; 
the fact that the stack of the hung pop process ends around a gethostbyname 
call could thus just be a red herring.

On the other hand, Dovecot manages to go a bit further than my pseudochild.c 
code: with an uid/gid pair such as 100018/20, Dovecot's pop3 process doesn't 
hang and performs without a glitch, while pseudochild desperately insists on 
entering in a stuck state.

Currently, I'm with a wtf? mood...
I would really appreciate some hints, some explanations for all those 
phenomenona...

TIA,
Axel



pseudochild.c
=============
#include <syslog.h>
#include <unistd.h>
#include <sys/errno.h>
#include <sys/types.h>
#include <pwd.h>
#include <uuid/uuid.h>

int   main( int argc, const char * argv[]) 
{
        struct passwd * pw;
        gid_t gidset[1];

        uid_t uid = 100018;
        gid_t gid = 100018;
        
        gidset[0] = gid;

        syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: started.");
        
        if (setgid(gid) != 0)
        {
                syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setgid() failed.");
                _exit(1);
        }
        if (setgroups(1, gidset) != 0)
        {
                syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setgroups() failed.");
                _exit(1);
        }
        if (setuid(uid) != 0)
        {
                syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setuid() failed.");
                _exit(1);
        }
        
        errno = 0;
        pw = getpwuid(uid);
        if ( pw != NULL )
        {
                syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: getpwuid() 
succeeded");
                _exit(0);
        }
        else
        {
                syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: getpwuid() failed 
with rc %i", errno);
                _exit(1);
        }
}


_______________________________________________
launchd-dev mailing list
launchd-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/launchd-dev

Reply via email to