We're looking for some help/advice with a Kannel problem we've just
encountered.
- We're using Kannel CVS (up-to-date as of today, 28/07/2006)
- Kannel is built with --prefix=/opt/kannel --enable-start-stop-daemon
- Kannel is running on an Intel Xeon 2.8 GHz w/ 2 GB RAM, running Centos
4.3 with all the latest updates
- We are using two instances of OpenSMPP SMSC simulator running on two
separate machines (similar specs to above) generating a total of 200000
messages as fast as they can.
- Kannel is delivering to a stock httpd running on a separate machine,
which returns a small (6 byte) response body.
Our kannel.conf is as follows:
# Core configuration group
group = core
admin-port = 13000
admin-password = xyz123
smsbox-port = 13010
log-file = /opt/kannel/log/bearer.log
log-level = 0
access-log = /opt/kannel/log/kannel_sms_traffic.log
store-file = /tmp/kannel_sms_spool
unified-prefix = "-,+"
# SmsBox configuration
group = smsbox
bearerbox-host = localhost
log-file = /opt/kannel/log/smsbox.log
log-level = 0
access-log = /opt/kannel/log/kannel_sms_traffic.log
reply-couldnotfetch = ""
reply-couldnotrepresent = ""
reply-requestfailed = ""
http-request-retry = 0
sendsms-port = 13189
include = "/opt/kannel/conf/test.conf"
# Kannel Push Service
group = sendsms-user
username = kannelpush
password = kannelpass
name = kannelpush
user-allow-ip = "*.*.*.*"
max-messages = 10
# Mappings
group = sms-service
name = nextenso_gw
keyword = default
get-url =
http://10.100.123.20/test.html?msgData=%a&sourceAddr=%p&channel=%i&destinationAddr=%P
max-messages = 1
catch-all = true
concatenation = false
split-chars = " "
omit-empty = true
...and test.conf is:
# Test 1
group = smsc
smsc = smpp
smsc-id = test
host = smsc1
port = 7011
receive-port = 7011
transceiver-mode = false
smsc-username = test
smsc-password = test
system-type = smpp
address-range = ""
source-addr-ton = 0
source-addr-npi = 0
dest-addr-ton = 0
dest-addr-npi = 0
alt-charset="ASCII"
enquire-link-interval = 30
# Test 2
group = smsc
smsc = smpp
smsc-id = test
host = smsc2
port = 7011
receive-port = 7011
transceiver-mode = false
smsc-username = test
smsc-password = test
system-type = smpp
address-range = ""
source-addr-ton = 0
source-addr-npi = 0
dest-addr-ton = 0
dest-addr-npi = 0
alt-charset="ASCII"
enquire-link-interval = 30
Our problem is this: when injecting messages we get a throughput in
excess of 600 inbound SMPP MO messages per second. Shortly after
starting to inject the messages, we see a large number of errors in
smsbox.log as follows:
2006-07-28 16:55:23 [18715] [4] INFO: Starting to service <testXXX> from
<61432123123> to <1234>
2006-07-28 16:55:23 [18715] [9] ERROR: Couldn't create new socket.
2006-07-28 16:55:23 [18715] [9] ERROR: System error 24: Too many open files
2006-07-28 16:55:23 [18715] [9] ERROR: error connecting to server
`10.100.123.20' at port `80'
2006-07-28 16:55:23 [18715] [9] ERROR: Couldn't send request to
<http://10.100.123.20/test.html?msgData=testXXX&sourceAddr=61432123123&channel=test&destinationAddr=1234>
Using 'lsof' and tweaking the 'nofiles' parameter in
'/etc/security/limits' confirms that Kannel is hitting the system's
limit of number of open files per process when trying to create
connections to the http server. The errors are generated in the function
'static Connection *get_connection(HTTPServer *trans)' in gwlib/http.c .
As a result of these errors even though the Kannel inbound counter
indicates that all the incoming messages have been received, many of
them are not successfully delivered to the http server. They are not
retransmitted, and are effectively lost.
We think there are some problems cleaning up connections to the http
server after they have been used - lsof shows that it can take some time
(up to a minute or two) for the number of open file descriptors for the
Kannel processes to drop, after traffic has stopped passing through
Kannel. Also, when this problem is encountered, lsof shows a large
number of strange sockets, e.g:
COMMAND PID USER FD TYPE DEVICE SIZE
NODE NAME
...
smsbox 9614 kannel 29u sock 0,4
6497455 can't identify protocol
smsbox 9614 kannel 30u sock 0,4
6497456 can't identify protocol
smsbox 9614 kannel 31u sock 0,4
6497457 can't identify protocol
smsbox 9614 kannel 32u sock 0,4
6497458 can't identify protocol
...
smsbox 9614 kannel 741u sock 0,4
6497872 can't identify protocol
smsbox 9614 kannel 742u sock 0,4
6497873 can't identify protocol
smsbox 9614 kannel 743u sock 0,4
6497874 can't identify protocol
smsbox 9614 kannel 744u IPv4 6483551
TCP 10.100.123.199:53061->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 745u sock 0,4
6493107 can't identify protocol
smsbox 9614 kannel 746u sock 0,4
6493108 can't identify protocol
smsbox 9614 kannel 747u sock 0,4
6493109 can't identify protocol
smsbox 9614 kannel 748u sock 0,4
6493110 can't identify protocol
smsbox 9614 kannel 749u sock 0,4
6493111 can't identify protocol
smsbox 9614 kannel 750u sock 0,4
6493112 can't identify protocol
smsbox 9614 kannel 751u sock 0,4
6493113 can't identify protocol
smsbox 9614 kannel 752u sock 0,4
6493114 can't identify protocol
smsbox 9614 kannel 753u sock 0,4
6493115 can't identify protocol
smsbox 9614 kannel 754u sock 0,4
6477537 can't identify protocol
smsbox 9614 kannel 755u sock 0,4
6477538 can't identify protocol
smsbox 9614 kannel 756u sock 0,4
6477539 can't identify protocol
smsbox 9614 kannel 757u sock 0,4
6477540 can't identify protocol
smsbox 9614 kannel 758u IPv4 6483552
TCP 10.100.123.199:53062->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 759u IPv4 6483553
TCP 10.100.123.199:53063->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 760u IPv4 6483554
TCP 10.100.123.199:53064->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 761u IPv4 6483555
TCP 10.100.123.199:53065->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 762u IPv4 6483556
TCP 10.100.123.199:53066->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 763u IPv4 6483557
TCP 10.100.123.199:53067->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 764u IPv4 6483558
TCP 10.100.123.199:53068->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 765u IPv4 6483559
TCP 10.100.123.199:53069->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 766u sock 0,4
6477541 can't identify protocol
smsbox 9614 kannel 767u IPv4 6483560
TCP 10.100.123.199:53070->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 768u IPv4 6483561
TCP 10.100.123.199:53071->10.100.123.20:http (SYN_SENT)
smsbox 9614 kannel 769u sock 0,4
6493116 can't identify protocol
smsbox 9614 kannel 770u sock 0,4
6493117 can't identify protocol
smsbox 9614 kannel 771u sock 0,4
6493118 can't identify protocol
...
smsbox 9614 kannel 1006u sock 0,4
6447308 can't identify protocol
smsbox 9614 kannel 1007u sock 0,4
6447309 can't identify protocol
smsbox 9614 kannel 1008u sock 0,4
6493212 can't identify protocol
smsbox 9614 kannel 1009u sock 0,4
6493213 can't identify protocol
smsbox 9614 kannel 1010u sock 0,4
6493214 can't identify protocol
...
Limiting the number of concurrent connections that the http server
allows doesn't help - Kannel always seems to have more open file
descriptors than connections to the http server.
So: can someone confirm whether this really is a problem with Kannel (I
would be very surprised if it's been in use, in production all this time
with this kind of behaviour under load)? Is our configuration wrong,
perhaps? How should we be configuring Kannel to deal with this kind of
situation?
Also, we noticed some comments in gwlib/http.c,
/* XXX re-implement socket pools, with idle connection killing to save
sockets */
/* XXX set maximum number of concurrent connections to same host, total? */
These look like they may be directly related to the problem we're
experiencing - is anyone working on these tasks, and if so, is there an
ETA to implementation?
Thanks,
--
Giulio Harding
Systems Administrator
m.Net Corporation
Level 13, 99 Gawler Place
Adelaide SA 5000, Australia
Tel: +61 8 8210 2041
Fax: +61 8 8211 9620
Mobile: 0432 876 733
MSN: [EMAIL PROTECTED]
http://www.mnetcorporation.com