Hi.
I am getting some issues at incoming smtp connections for some weeks.
esmtpd MAXDAEMONS is set to 350 and I at peak times I used to readh at max 150
simultaneous courieresmtpd process running.
But now I am reaching the top limit many times a day, then my servers stop
acceping new connections.
After collecting some data on this problem it seems is related to the raising
number of infected machines running smtp Zombies. This Zombies keeps trying to
send spam, and, even if I deny all try, they never close the smtp connection.
Courieresmtpd process keeps running for hours. At some poing submit process
kills himself at the hard 1800 seconds alarm timeout. But even with the submit
at defunct state courieresmtpd process keeps running.
My only way to keep the low number of courieresmtpd process is to block the
originating IP at firewall or at a smtpaccess DENY at couriertcpd. If the send
is blocked at some other point (BLACKLIST, BOTH, SPF, smtpaccess BLOCK, RELAY
DENY or USER UNKNOWN) the submit process had already started and the
courieresmtpd process stay locked for hours. I am loosing this game. I already
blocked more 500 /24 nets but the number of locked connections keeps raising
until I restart esmtpd.
Zombies is what I suspect, but seems smtp incomming connections are getting
stucked for other reasons too.
Other strange issue that start happening is lots of "writev: Broken pipe"
errors. May be its related to the Zombies, but I have seen this error happening
on normal smtp connections too.
Is this happening to any other?
Anyone have an idea of how to get this problem fixed. Seems I need some
configuration on courieresmtpd process to a faster timeout, or a limit of
errors each smtp connection can have, or a time limit where a smtp conection
shouls start the DATA transfer.
Below is a lot of debugging information I got. Sorry for the long post.
Marcus Pereira
***************************
Showing lots of submit process in defunct state:
# ps f -o pid,ppid,etime,cmd -C courieresmtpd,submit | grep -B1 defunct
18207 12590 00:09 /usr/sbin/courieresmtpd
18587 18207 00:01 \_ [submit] <defunct>
--
8586 12590 06:43 /usr/sbin/courieresmtpd
17753 8586 00:23 \_ [submit] <defunct>
--
32301 12590 30:38 /usr/sbin/courieresmtpd
32324 32301 30:38 \_ [submit] <defunct>
--
17572 12590 39:25 /usr/sbin/courieresmtpd
17700 17572 39:21 \_ [submit] <defunct>
17383 12590 39:32 /usr/sbin/courieresmtpd
17397 17383 39:31 \_ [submit] <defunct>
15410 12590 40:29 /usr/sbin/courieresmtpd
15497 15410 40:26 \_ [submit] <defunct>
13700 12590 41:26 /usr/sbin/courieresmtpd
13746 13700 41:24 \_ [submit] <defunct>
13438 12590 41:33 /usr/sbin/courieresmtpd
13449 13438 41:33 \_ [submit] <defunct>
10004 12590 43:27 /usr/sbin/courieresmtpd
10013 10004 43:27 \_ [submit] <defunct>
8705 12590 44:17 /usr/sbin/courieresmtpd
8887 8705 44:11 \_ [submit] <defunct>
7493 12590 45:10 /usr/sbin/courieresmtpd
7716 7493 45:00 \_ [submit] <defunct>
7172 12590 45:27 /usr/sbin/courieresmtpd
7195 7172 45:26 \_ [submit] <defunct>
6222 12590 46:02 /usr/sbin/courieresmtpd
6304 6222 45:58 \_ [submit] <defunct>
5028 12590 46:51 /usr/sbin/courieresmtpd
5032 5028 46:51 \_ [submit] <defunct>
5025 12590 46:51 /usr/sbin/courieresmtpd
5031 5025 46:51 \_ [submit] <defunct>
4328 12590 47:15 /usr/sbin/courieresmtpd
4355 4328 47:14 \_ [submit] <defunct>
2213 12590 48:42 /usr/sbin/courieresmtpd
2269 2213 48:40 \_ [submit] <defunct>
2045 12590 48:46 /usr/sbin/courieresmtpd
2178 2045 48:44 \_ [submit] <defunct>
1804 12590 48:55 /usr/sbin/courieresmtpd
1856 1804 48:53 \_ [submit] <defunct>
1726 12590 48:59 /usr/sbin/courieresmtpd
1740 1726 48:58 \_ [submit] <defunct>
980 12590 49:22 /usr/sbin/courieresmtpd
2634 980 48:25 \_ [submit] <defunct>
581 12590 49:34 /usr/sbin/courieresmtpd
593 581 49:33 \_ [submit] <defunct>
31665 12590 50:29 /usr/sbin/courieresmtpd
31725 31665 50:28 \_ [submit] <defunct>
29477 12590 51:47 /usr/sbin/courieresmtpd
16372 29477 01:04 \_ [submit] <defunct>
28991 12590 52:06 /usr/sbin/courieresmtpd
29053 28991 52:05 \_ [submit] <defunct>
28727 12590 52:18 /usr/sbin/courieresmtpd
28740 28727 52:17 \_ [submit] <defunct>
28483 12590 52:29 /usr/sbin/courieresmtpd
28518 28483 52:27 \_ [submit] <defunct>
28121 12590 52:44 /usr/sbin/courieresmtpd
31412 28121 50:41 \_ [submit] <defunct>
27343 12590 53:11 /usr/sbin/courieresmtpd
27397 27343 53:09 \_ [submit] <defunct>
27338 12590 53:11 /usr/sbin/courieresmtpd
27347 27338 53:11 \_ [submit] <defunct>
25839 12590 54:14 /usr/sbin/courieresmtpd
25880 25839 54:13 \_ [submit] <defunct>
24281 12590 55:13 /usr/sbin/courieresmtpd
24392 24281 55:11 \_ [submit] <defunct>
23157 12590 55:54 /usr/sbin/courieresmtpd
23210 23157 55:52 \_ [submit] <defunct>
21497 12590 56:58 /usr/sbin/courieresmtpd
21502 21497 56:57 \_ [submit] <defunct>
20528 12590 57:40 /usr/sbin/courieresmtpd
20564 20528 57:38 \_ [submit] <defunct>
20255 12590 57:50 /usr/sbin/courieresmtpd
20256 20255 57:50 \_ [submit] <defunct>
19662 12590 58:16 /usr/sbin/courieresmtpd
19685 19662 58:15 \_ [submit] <defunct>
18499 12590 58:56 /usr/sbin/courieresmtpd
18562 18499 58:55 \_ [submit] <defunct>
16709 12590 01:00:03 /usr/sbin/courieresmtpd
16770 16709 01:00:01 \_ [submit] <defunct>
16433 12590 01:00:16 /usr/sbin/courieresmtpd
16447 16433 01:00:16 \_ [submit] <defunct>
15893 12590 01:00:45 /usr/sbin/courieresmtpd
15927 15893 01:00:44 \_ [submit] <defunct>
15785 12590 01:00:49 /usr/sbin/courieresmtpd
15845 15785 01:00:47 \_ [submit] <defunct>
15079 12590 01:01:11 /usr/sbin/courieresmtpd
15150 15079 01:01:09 \_ [submit] <defunct>
14685 12590 01:01:28 /usr/sbin/courieresmtpd
14720 14685 01:01:27 \_ [submit] <defunct>
13389 12590 01:02:21 /usr/sbin/courieresmtpd
13426 13389 01:02:20 \_ [submit] <defunct>
12599 12590 01:03:00 /usr/sbin/courieresmtpd
9451 12599 43:50 \_ [submit] <defunct>
###############################################################
Getting, for example, strace for courieresmtpd process 20528
# tail -n 27 strace.20528
10:52:21 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:52:21 nanosleep({128, 0}, {128, 0}) = 0
10:54:29 sendto(3, "<19>Apr 17 10:54:29 courieresmtpd: error,relay=::f"...,
189, MSG_NOSIGNAL, NULL, 0) = 189
10:54:29 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55)
= -1 EPIPE (Broken pipe)
10:54:29 --- SIGPIPE (Broken pipe) @ 0 (0) ---
10:54:29 write(5, "[email protected]\t\t\n", 36) = 36
10:54:29 read(6, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 4096)
= 54
10:54:29 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
10:54:29 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
10:54:29 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:54:29 nanosleep({128, 0}, {128, 0}) = 0
10:56:37 sendto(3, "<19>Apr 17 10:56:37 courieresmtpd: error,relay=::f"...,
189, MSG_NOSIGNAL, NULL, 0) = 189
10:56:37 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55)
= -1 EPIPE (Broken pipe)
10:56:37 --- SIGPIPE (Broken pipe) @ 0 (0) ---
10:56:37 write(5, "[email protected]\t\t\n", 35) = 35
10:56:37 read(6, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 4096)
= 54
10:56:37 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
10:56:37 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
10:56:37 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:56:37 nanosleep({128, 0}, {128, 0}) = 0
10:58:45 sendto(3, "<19>Apr 17 10:58:45 courieresmtpd: error,relay=::f"...,
188, MSG_NOSIGNAL, NULL, 0) = 188
10:58:45 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55)
= -1 EPIPE (Broken pipe)
10:58:45 --- SIGPIPE (Broken pipe) @ 0 (0) ---
10:58:45 sendto(3, "<22>Apr 17 10:58:45 courieresmtpd: error,relay=::f"...,
107, MSG_NOSIGNAL, NULL, 0) = 107
10:58:45 select(2, NULL, [1], NULL, {4800, 0}) = ? ERESTARTNOHAND (To be
restarted)
11:18:10 --- SIGCHLD (Child exited) @ 0 (0) ---
11:18:10 select(2, NULL, [1], NULL, {3634, 984000}
#########################
And for the child submit process pid 20564
# tail -n 27 strace.20564
10:54:29 lseek(6, 0, SEEK_SET) = 0
10:54:29 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"...,
1024) = 874
10:54:29 stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
10:54:29 close(6) = 0
10:54:29 munmap(0x7f368cbd3000, 4096) = 0
10:54:29 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 54)
= 54
10:54:29 poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, -1)
= 1 ([{fd=0, revents=POLLIN}])
10:56:37 read(0, "[email protected]\t\t\n", 8192) = 35
10:56:37 statfs(".", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096,
f_blocks=34151844, f_bfree=32936550, f_bavail=31201734, f_files=8675328,
f_ffree=8559136, f_fsid={162176456, -721074552}, f_namelen=255, f_frsize=4096})
= 0
10:56:37 stat(".", {st_mode=S_IFDIR|0770, st_size=4096, ...}) = 0
10:56:37 open("/proc/mounts", O_RDONLY) = 6
10:56:37 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
10:56:37 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f368cbd3000
10:56:37 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"...,
1024) = 874
10:56:37 read(6, "", 1024) = 0
10:56:37 lseek(6, 0, SEEK_SET) = 0
10:56:37 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"...,
1024) = 874
10:56:37 stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
10:56:37 close(6) = 0
10:56:37 munmap(0x7f368cbd3000, 4096) = 0
10:56:37 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 54)
= 54
10:56:37 poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, -1) = ?
ERESTART_RESTARTBLOCK (To be restarted)
11:18:10 --- SIGALRM (Alarm clock) @ 0 (0) ---
11:18:10 rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART,
0x7f368b8ac190}, {0x411620, [INT], SA_RESTORER|SA_RESTART, 0x7f368b8ac190}, 8)
= 0
11:18:10 getpid() = 20564
11:18:10 kill(20564, SIGKILL) = 0
11:18:10 +++ killed by SIGKILL +++
###############################
mail.log for this IP:
# grep 190.42.107 /var/log/mail.log
Apr 17 10:48:08 cuma courieresmtpd: started,ip=[::ffff:190.42.107.41]
Apr 17 10:48:20 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:48:36 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:49:08 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:50:13 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:52:21 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:54:29 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:56:37 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:58:45 cuma courieresmtpd:
error,relay=::ffff:190.42.107.41,from=<[email protected]>,to=<[email protected]>:
511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:58:45 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,msg="502
ESMTP command error",cmd: DATA
##################################
Broken pipe logs. I am getting thousands of this message per hour:
.
.
.
Apr 17 11:04:05 cuma courieresmtpd:
error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:04:18 cuma courieresmtpd:
error,relay=::ffff:201.20.24.36,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:05:13 cuma courieresmtpd:
error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET
Apr 17 11:05:13 cuma courieresmtpd:
error,relay=::ffff:81.184.51.141,msg="writev: Broken pipe",cmd: DATA
Apr 17 11:05:32 cuma courieresmtpd:
error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET
Apr 17 11:06:10 cuma courieresmtpd:
error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET
Apr 17 11:06:49 cuma courieresmtpd:
error,relay=::ffff:208.43.194.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:07:07 cuma courieresmtpd:
error,relay=::ffff:64.118.95.132,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:07:49 cuma courieresmtpd:
error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:11:04 cuma courieresmtpd:
error,relay=::ffff:200.162.176.8,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:11:30 cuma courieresmtpd:
error,relay=::ffff:174.133.25.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:11:35 cuma courieresmtpd:
error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:12:28 cuma courieresmtpd:
error,relay=::ffff:72.232.211.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:12:35 cuma courieresmtpd:
error,relay=::ffff:72.232.211.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:13:09 cuma courieresmtpd:
error,relay=::ffff:81.215.80.190,msg="writev: Broken pipe",cmd: QUIT
.
.
.
------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
courier-users mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users