I could not find the function on my new linux. It looks like a thread rentrancy problem - very strange. Is this a multiprocessor machine?
It might take me a little while to look at this as I am about to travel to the US for a week. M On Fri, 2005-04-08 at 10:12 -0700, Mark Keller wrote: > >> Mark, this sounds odd. Nothing has changed here in the cfengine code for > > many years. I don't think the innetgr function is portable. It looks > > solarissy. Have you tried doing an strace (truss) to see exactly what it > > gets stuck on? > > > > The innetgr function seems to be on Solaris, Redhat EL3, Fedora, HPUX 11, > FreeBSD, and Mac OS X in netdb.h (just doing a quick google search anyhow). I > suppose it might not be on all OS's though. Am-utils uses it exclusively for > netgroup calls for all OS's it supports and so does pam_access, which are > working for me. > > So I added a couple of debug statements to the original code: > case netgroup: setnetgrent(ebuff); > Debug1("set netgroup to %s\n",ebuff); > > while (getnetgrent(&machine,&user,&domain)) > { > Debug1("getnetgrent m = %s, u = %s, d = %s in netgroup > %s\n",machine,user,domain,ebuff); > if (strcmp(machine,VDEFAULTBINSERVER.name) == 0) > { > Debug1("Matched %s in netgroup %s\n",machine,ebuff); > AddClassToHeap(GROUPBUFF); > break; > } > ...truncated... > } > > endnetgrent(); > break; > > Running "cfagent -Bvq -d1", I get the following: > > ....truncated..... > ==============================BEGIN NEW ACTION Groups:============= > > > Resetting CLASS to ANY > > LVALUE betahosts > HandleLVALUE(betahosts) in action Groups: > EQUALS = > LEFTBRACK > RVAL-VAROBJ [EMAIL PROTECTED] > > HandleGroupRvalue([EMAIL PROTECTED]) > Netgroup rval, lookup NIS group (ops-beta-sos5) > ExpandVarstring(ops-beta-sos5) > HandleGroupRVal(ops-beta-sos5) group (betahosts), type=1 > set netgroup to ops-beta-sos5 > cfengine:: Time out > > It takes a couple of minutes before I get the "Time out" error and then > cfagent just sits there for an extremely long time. > > > Now Running "truss -f cfagent -Bvq -d1", I get the following: > > ....truncated....... > 8328: write(5, "160301\08610\0\082\0809F".., 182) = 182 > 8328: read(5, "140301\001", 5) = 5 > 8328: read(5, "01", 1) = 1 > 8328: read(5, "160301\0 ", 5) = 5 > 8328: read(5, " O 09D12 G1EF981 kF59D b".., 32) = 32 > 8328: time() = 1112978413 > 8328: write(5, "170301\01ED8 ]BE 3 =B0 ]".., 35) = 35 > 8328: time() = 1112978413 > 8328: poll(0xFFBF9708, 1, 30000) = 1 > 8328: read(5, "170301\01E", 5) = 5 > 8328: read(5, "C8C5A8C1EB N $1A19 g H05".., 30) = 30 > 8328: time() = 1112978413 > 8328: setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, 0xFFBFB8D0, 4, 1) = 0 > 8328: fcntl(5, F_SETFD, 0x00000001) = 0 > 8328: getsockname(5, 0xFEFF00C0, 0xFFBFB8CC, 1) = 0 > 8328: getpeername(5, 0xFEFF00D0, 0xFFBFB8C8, 1) = 0 > 8328: time() = 1112978413 > 8328: getpid() = 8328 [8327] > 8328: getuid() = 0 [0] > 8328: time() = 1112978413 > 8328: write(5, "170301\095D58BA9\r FAE S".., 154) = 154 > 8328: poll(0xFFBF96B0, 1, -1) = 1 > 8328: read(5, "17030101 .", 5) = 5 > 8328: read(5, "\tF20215D380\0B3A41BDD83".., 302) = 302 > 8328: poll(0xFFBF96B0, 1, -1) = 1 > 8328: read(5, "170301\01E", 5) = 5 > 8328: read(5, "9CBC\b G02 [0691B2\nFBF0".., 30) = 30 > 8328: time() = 1112978413 > 8328: time() = 1112978413 > 8328: sigaction(SIGPIPE, 0xFFBFC0C0, 0x00000000) = 0 > set netgroup to ops-beta-sos5 > 8328: write(1, " s e t n e t g r o u p".., 30) = 30 > 8328: sigaction(SIGPIPE, 0xFFBFC118, 0xFEFF10E0) = 0 > 8328: mmap(0x00000000, 110592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, > -1, 0) = 0xFEDA0000 > 8328: lwp_park(0x00000000, 0) (sleeping...) > 8328: Received signal #14, SIGALRM, in lwp_park() [caught] > 8328: lwp_park(0x00000000, 0) Err#4 EINTR > 8328: sigprocmask(SIG_SETMASK, 0xFFBFBC54, 0x00000000) = 0 > 8328: alarm(0) = 0 > cfengine:: Time out > 8328: write(1, " c f e n g i n e : : T".., 20) = 20 > 8328: sigprocmask(SIG_SETMASK, 0xFF07A074, 0xFFBFBA08) = 0 > 8328: lwp_unpark(1, 1) = 0 > 8328: setcontext(0xFFBFBA18) > 8328: lwp_park(0x00000000, 0) = 0 > 8328: lwp_park(0x00000000, 0) (sleeping...) > > Here is takes a couple of minutes between the first lwp_park and the "Time > out" error, then stays on the last lwp_park for an extremely long time. Might > be infinite, but it is at least hours. If you need more trussing please let > me know. > > So to me seems like something is going wrong around the getnetgrent function. > I can't tell what might be the problem as my C coding/debugging is rusty. My > small test code using getnetgrent seems to work just fine. > > Thanks, > > Mark Keller > > > > > > > > > _______________________________________________ > Bug-cfengine mailing list > Bug-cfengine@gnu.org > http://lists.gnu.org/mailman/listinfo/bug-cfengine _______________________________________________ Bug-cfengine mailing list Bug-cfengine@gnu.org http://lists.gnu.org/mailman/listinfo/bug-cfengine