Greetings.
There is an issue of sporadic disconnections of some ESMEs by
OPENSMPPBOX particularly when operated under high load and a high
latency ESME to OPENSMPPBOX network link. Please find the problem
explained in detail below. There is a suggested patch (against Revision:
75 of OPENSMPPBOX) attached. Kindly review and commit accordingly.
(A) The problem explained:
It is tricky to reproduce the bug. However one verified possibility of
occurrence is when the following conditions are met:
1) The OPENSMPPBOX's client (ESME) sends submit_sm's with a high
submission rate. (for example, with a high 'max-pending-submits' setting
when using bearerbox as the ESME)
2) The high submission rate causes the transmission-layer between the
ESME and OPENSMPPBOX to often split a submit_sm PDU across 2 different
TCP segments.
3) OPENSMPPBOX calls a 'read_pdu' in the 'smpp_to_bearerbox' thread
between the arrival of these 2 segments.
4) The entire PDU-data is not available and 'smpp_pdu_read_data' call
returns NULL. The 'smpp_to_bearerbox' thread does not retain the value
of the PDU-length in these situations (although it is strictly required
to; refer to comments for 'read_pdu' in the source code)
5) The successive 'read_pdu' call forgets about the previously
unsuccessful 'smpp_pdu_read_data' call and now attempts to read the 2nd
four octets of the original PDU (via the 'smpp_pdu_read_len' call)
assuming this to be the 'command_length' field of a newly arrived PDU.
(Actually what is now read is the "command_id" field of the original
PDU)
6) This attempt to read a wrong value as the PDU-length, causes
'smpp_pdu_read_len' to return -1 if the value read is decoded as less
than 'MIN_SMPP_PDU_LEN'. (This will always be the case for submit_sm
PDU's since the 'command_id' field is 0x00000004 which decodes as 4 and
is less than 16)
7) Subsequently 'read_pdu' returns -1 and the 'smpp_to_bearerbox' thread
shuts down the ESME connection and terminates.
(B) SUGGESTED PATCH (against Revision: 75 of OPENSMPPBOX):
--- opensmppbox.c (revision 75)
+++ opensmppbox.c (working copy)
@@ -1862,8 +1862,8 @@
long len;
box->last_pdu_received = time(NULL);
+ len = 0;
while (smppbox_status == SMPP_RUNNING && box->alive) {
- len = 0;
switch (read_pdu(box, conn, &len, &pdu)) {
case -1:
error(0, "Invalid SMPP PDU received.");
(C) Illustration of a reconnection:
(C.1) Relevant excerpts from the OPENSMPPBOX log-file with log-level =
0:
2013-11-08 23:17:58 [20141] [9] DEBUG: type_name: submit_sm_resp
2013-11-08 23:17:58 [20141] [9] DEBUG: command_id: 2147483652 =
0x80000004
2013-11-08 23:17:58 [20141] [9] DEBUG: command_status: 0 = 0x00000000
2013-11-08 23:17:58 [20141] [9] DEBUG: sequence_number: 38101 =
0x000094d5
2013-11-08 23:17:58 [20141] [9] DEBUG: message_id:
2013-11-08 23:17:58 [20141] [9] DEBUG: Octet string at
0x7fcd0c016200:
2013-11-08 23:17:58 [20141] [9] DEBUG: len: 36
2013-11-08 23:17:58 [20141] [9] DEBUG: size: 37
2013-11-08 23:17:58 [20141] [9] DEBUG: immutable: 0
2013-11-08 23:17:58 [20141] [9] DEBUG: data: 38 36 65 36 38 35 31
63 2d 66 64 34 30 2d 34 63 86e6851c-fd40-4c
2013-11-08 23:17:58 [20141] [9] DEBUG: data: 38 62 2d 39 61 39 37
2d 31 62 31 39 63 39 61 30 8b-9a97-1b19c9a0
2013-11-08 23:17:58 [20141] [9] DEBUG: data: 38 38 66 30
88f0
2013-11-08 23:17:58 [20141] [9] DEBUG: Octet string dump ends.
2013-11-08 23:17:58 [20141] [9] DEBUG: SMPP PDU dump ends.
2013-11-08 23:17:59 [20141] [10] ERROR: SMPP: PDU length was too small
(4, minimum is 16).
2013-11-08 23:17:59 [20141] [10] ERROR: opensmppbox[user]: Server sent
garbage, ignored.
2013-11-08 23:17:59 [20141] [10] ERROR: Invalid SMPP PDU received.
2013-11-08 23:17:59 [20141] [10] DEBUG: Thread 10
(opensmppbox.c:smpp_to_bearerbox) terminates.
2013-11-08 23:17:59 [20141] [9] DEBUG: Thread 9 (opensmppbox.c:function)
terminates.
2013-11-08 23:18:10 [20141] [0] DEBUG: Started thread 11
(opensmppbox.c:function)
2013-11-08 23:18:10 [20141] [11] DEBUG: Thread 11
(opensmppbox.c:function) maps to pid 20141.
2013-11-08 23:18:10 [20141] [11] INFO: Client connected from
<XXX.XXX.XXX.XXX>
Note:
(C.1.1) The last submit_sm_resp is being sent for the 'sequence_number:
38101'
(C.1.2) Reading the PDU with sequence_number: 38102 is what causes the
ERROR and the subsequent termination of the 'smpp_to_bearerbox' 'Thread
10' (Refer to notes under section (C.2) below)
(C.1.3) The ESME gets disconnected and initiates a new connection to
OPENSMPPBOX.
(C.2) Relevant excerpts from the captured packets analyzed:
The Reassembled submit_sm PDU with 'sequence_number: 38102'
0000 00 00 00 38 00 00 00 04 00 00 00 00 00 00 94 d6 ...8....
........
0010 00 05 00 58 58 2d 4b 41 4e 4e 45 4c 00 00 01 31 ...XX-KA
NNEL...1
0020 32 33 34 35 36 37 38 39 30 00 03 00 00 00 00 01 23456789
0.......
0030 00 00 00 04 54 45 53 54 ....TEST
Note:
(C.2.1) The 56 byte submit_sm PDU with 'sequence_number: 38102' arrives
in 2 parts - 40 bytes as the trailing part of the 1st segment's payload
and 16 bytes in the leading part of the second segment's payload.
(C.2.2) The delay in arrival of the 2nd segment makes PDU-data
unavailable to be read in its entirety. The original 'command_length'
field 0x00000038 is discarded.
(C.2.3) The 2nd four octets 0x00000004 (which is in fact the original
'command_id' field) is now read in as the PDU-length causing the error.
P.S: This message is being posted to the KANNEL developers' mailing list
directly because the registration process at redmine.kannel.org failed
to sent a confirmation email.
Thanks and Best Regards
Juno Lopez
Director - R&D
APRICOT
Trivandrum, India
http://apricot.net.in
--- opensmppbox.c (revision 75)
+++ opensmppbox.c (working copy)
@@ -1862,8 +1862,8 @@
long len;
box->last_pdu_received = time(NULL);
+ len = 0;
while (smppbox_status == SMPP_RUNNING && box->alive) {
- len = 0;
switch (read_pdu(box, conn, &len, &pdu)) {
case -1:
error(0, "Invalid SMPP PDU received.");