http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5731
------- Additional Comments From [EMAIL PROTECTED] 2008-02-04 14:56 ------- got one! I think... http://buildbot.spamassassin.org:8010/t-feisty-561/builds/880/step-test/0 contains this: t/spamd_kill_restart............# Failed test 41 in t/spamd_kill_restart.t at line 43 fail #5 # t/spamd_kill_restart.t line 43 is: ok (!-e $pid_file); spamd start failed: log: [15535] dbg: logger: adding facilities: all [15535] dbg: logger: logging level is DBG [15535] dbg: logger: successfully opened file log/d.spamd_kill_restart/spamd.err.42.timestamped [15535] dbg: logger: successfully added file method [15535] dbg: spamd: will perform setuids? 0 [15535] dbg: spamd: creating INET socket: [15535] dbg: spamd: Listen: 128 [15535] dbg: spamd: LocalAddr: 127.0.0.1 [15535] dbg: spamd: LocalPort: 48236 [15535] dbg: spamd: Proto: 6 [15535] dbg: spamd: ReuseAddr: 1 [15535] dbg: spamd: Type: 1 [15535] warn: server socket setup failed, retry 1: spamd: could not create INET socket on 127.0.0.1:48236: Address already in use http://taint.org/x/2008/bug5731.d.spamd_kill_restart.tgz contains the full logs from the spamd_kill_restart.t test case, with strace logs. Here's an examination of the key log, spamd.strace.33. First off, this is the last part of signal-handling before the error. note that SIGINT is unblocked: 15514 1202157063.466831 rt_sigprocmask(SIG_UNBLOCK, [INT CHLD], NULL, 8) = 0 15514 1202157063.468826 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 15524 1202157063.493003 time( <unfinished ...> 15514 1202157063.493065 rt_sigaction(SIGALRM, NULL, {SIG_DFL}, 8) = 0 15514 1202157063.493222 rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DFL}, 8) = 0 15514 1202157063.493367 rt_sigaction(SIGALRM, {0x8099360, [], SA_RESTART}, {SIG_DFL} , 8) = 0 15514 1202157063.493749 alarm(64) = 0 various unimportant spamd stuff omitted here. 15514 1202157063.499815 read(7, "I\0\0<\244\n", 6) = 6 15514 1202157063.500038 time(NULL) = 1202157063 15514 1202157063.500127 write(4, "Mon Feb 4 12:31:03 2008 [15514]"..., 77) = 77 15514 1202157063.500284 write(2, "[15514] dbg: prefork: child 1552"..., 52) = 52 15514 1202157063.503575 time(NULL) = 1202157063 15514 1202157064.696936 write(4, "Mon Feb 4 12:31:03 2008 [15514]"..., 74) = 74 15514 1202157064.697259 write(2, "[15514] dbg: prefork: new lowest"..., 49) = 49 15514 1202157064.697604 time(NULL) = 1202157064 15514 1202157064.697699 write(4, "Mon Feb 4 12:31:04 2008 [15514]"..., 66) = 66 15514 1202157064.697891 write(2, "[15514] dbg: prefork: child repo"..., 41) = 41 at this point, the test script kills the spamd using SIGINT: 15514 1202157064.698111 --- SIGINT (Interrupt) @ 0 (0) --- we should see perl calling into its $SIG{INT} handler. instead we see this: 15514 1202157064.698255 mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MA P_NORESERVE, -1, 0) = 0x408e8000 15514 1202157064.698376 munmap(0x408e8000, 98304) = 0 15514 1202157064.698466 munmap(0x40a00000, 950272) = 0 15514 1202157064.698596 mprotect(0x40900000, 135168, PROT_READ|PROT_WRITE) = 0 15514 1202157064.698743 futex(0x401a1120, FUTEX_WAIT, 2, NULL <unfinished ...> this futex() call sleeps. Nothing happens until eventually a child's select() call hits its timeout: 15524 1202157363.464875 <... select resumed> ) = 0 (Timeout) the kids exit due to their timeouts: 15523 1202157364.652244 exit_group(0) = ? and the parent proc is interrupted for the SIGCHLD: 15514 1202157364.675992 <... futex resumed> ) = -1 EINTR (Interrupted system call) 15514 1202157364.676062 --- SIGCHLD (Child exited) @ 0 (0) --- but then goes back into the futex(): 15514 1202157364.676664 futex(0x401a1120, FUTEX_WAIT, 2, NULL <unfinished ...> and never exits. Comparing the time when the SIGINT arrives to other test cases that worked ok (e.g. spamd.strace.1), I see no difference; the SIGINT handler is unblocked, and it should be able to handle the signal. This looks like a perl signal-handling bug. for reference, here's what should have happened at the SIGINT: 15473 1202157028.628858 --- SIGINT (Interrupt) @ 0 (0) --- 15473 1202157028.629083 time(NULL) = 1202157028 15473 1202157028.629191 write(4, "Mon Feb 4 12:30:28 2008 [15473]"..., 85) = 85 15473 1202157028.629362 write(2, "[15473] info: spamd: server kill"..., 60) = 60 15473 1202157028.629710 close(5) = 0 15473 1202157028.629789 munmap(0x402ff000, 4096) = 0 15473 1202157028.629875 close(5) = -1 EBADF (Bad file descriptor) 15473 1202157028.629947 munmap(0x402fe000, 4096) = 0 15473 1202157028.630033 unlink("/home/buildbot/slaves/t-feisty-561/t-feisty-561/build/t/log/spamd.pid") = 0 etc. futex() isn't called in the file *at all*. It ONLY appears in the failing strace.... At this point I think I should switch to testing a later perl build; fiddling with perl 5.6.1 signal-handling is a bit of a waste of time, and I noted that the Ubuntu issue we're trying to track down should occur with 5.8.x too. Ideally it should be repro'able with the gutsy package of perl. Also worth trying: tracing with ltrace instead of strace, to get a better idea of wtf that futex() call is up to. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
