Patch applied. Current revision is 76. -----Original Message----- From: devel [mailto:[email protected]] On Behalf Of Alexander Malysh Sent: maandag 11 november 2013 18:25 To: [email protected] Cc: [email protected] org Subject: Re: [PATCH] opensmppbox: ESME disconnection fix (plain text)
Hi, similar issue was fixed in bearerbox itself, ++1 Alex Am 09.11.2013 um 06:26 schrieb [email protected]: > Greetings. > > There is an issue of sporadic disconnections of some ESMEs by > OPENSMPPBOX particularly when operated under high load and a high > latency ESME to OPENSMPPBOX network link. Please find the problem > explained in detail below. There is a suggested patch (against Revision: > 75 of OPENSMPPBOX) attached. Kindly review and commit accordingly. > > (A) The problem explained: > > It is tricky to reproduce the bug. However one verified possibility of > occurrence is when the following conditions are met: > > 1) The OPENSMPPBOX's client (ESME) sends submit_sm's with a high > submission rate. (for example, with a high 'max-pending-submits' > setting when using bearerbox as the ESME) > 2) The high submission rate causes the transmission-layer between the > ESME and OPENSMPPBOX to often split a submit_sm PDU across 2 different > TCP segments. > 3) OPENSMPPBOX calls a 'read_pdu' in the 'smpp_to_bearerbox' thread > between the arrival of these 2 segments. > 4) The entire PDU-data is not available and 'smpp_pdu_read_data' call > returns NULL. The 'smpp_to_bearerbox' thread does not retain the value > of the PDU-length in these situations (although it is strictly > required to; refer to comments for 'read_pdu' in the source code) > 5) The successive 'read_pdu' call forgets about the previously > unsuccessful 'smpp_pdu_read_data' call and now attempts to read the > 2nd four octets of the original PDU (via the 'smpp_pdu_read_len' call) > assuming this to be the 'command_length' field of a newly arrived PDU. > (Actually what is now read is the "command_id" field of the original > PDU) > 6) This attempt to read a wrong value as the PDU-length, causes > 'smpp_pdu_read_len' to return -1 if the value read is decoded as less > than 'MIN_SMPP_PDU_LEN'. (This will always be the case for submit_sm > PDU's since the 'command_id' field is 0x00000004 which decodes as 4 > and is less than 16) > 7) Subsequently 'read_pdu' returns -1 and the 'smpp_to_bearerbox' > thread shuts down the ESME connection and terminates. > > > (B) SUGGESTED PATCH (against Revision: 75 of OPENSMPPBOX): > > --- opensmppbox.c (revision 75) > +++ opensmppbox.c (working copy) > @@ -1862,8 +1862,8 @@ > long len; > > box->last_pdu_received = time(NULL); > + len = 0; > while (smppbox_status == SMPP_RUNNING && box->alive) { > - len = 0; > switch (read_pdu(box, conn, &len, &pdu)) { > case -1: > error(0, "Invalid SMPP PDU received."); > > > > (C) Illustration of a reconnection: > > (C.1) Relevant excerpts from the OPENSMPPBOX log-file with log-level = > 0: > > 2013-11-08 23:17:58 [20141] [9] DEBUG: type_name: submit_sm_resp > 2013-11-08 23:17:58 [20141] [9] DEBUG: command_id: 2147483652 = > 0x80000004 > 2013-11-08 23:17:58 [20141] [9] DEBUG: command_status: 0 = 0x00000000 > 2013-11-08 23:17:58 [20141] [9] DEBUG: sequence_number: 38101 = > 0x000094d5 > 2013-11-08 23:17:58 [20141] [9] DEBUG: message_id: > 2013-11-08 23:17:58 [20141] [9] DEBUG: Octet string at > 0x7fcd0c016200: > 2013-11-08 23:17:58 [20141] [9] DEBUG: len: 36 > 2013-11-08 23:17:58 [20141] [9] DEBUG: size: 37 > 2013-11-08 23:17:58 [20141] [9] DEBUG: immutable: 0 > 2013-11-08 23:17:58 [20141] [9] DEBUG: data: 38 36 65 36 38 35 31 > 63 2d 66 64 34 30 2d 34 63 86e6851c-fd40-4c > 2013-11-08 23:17:58 [20141] [9] DEBUG: data: 38 62 2d 39 61 39 37 > 2d 31 62 31 39 63 39 61 30 8b-9a97-1b19c9a0 > 2013-11-08 23:17:58 [20141] [9] DEBUG: data: 38 38 66 30 > 88f0 > 2013-11-08 23:17:58 [20141] [9] DEBUG: Octet string dump ends. > 2013-11-08 23:17:58 [20141] [9] DEBUG: SMPP PDU dump ends. > 2013-11-08 23:17:59 [20141] [10] ERROR: SMPP: PDU length was too small > (4, minimum is 16). > 2013-11-08 23:17:59 [20141] [10] ERROR: opensmppbox[user]: Server sent > garbage, ignored. > 2013-11-08 23:17:59 [20141] [10] ERROR: Invalid SMPP PDU received. > 2013-11-08 23:17:59 [20141] [10] DEBUG: Thread 10 > (opensmppbox.c:smpp_to_bearerbox) terminates. > 2013-11-08 23:17:59 [20141] [9] DEBUG: Thread 9 > (opensmppbox.c:function) terminates. > 2013-11-08 23:18:10 [20141] [0] DEBUG: Started thread 11 > (opensmppbox.c:function) > 2013-11-08 23:18:10 [20141] [11] DEBUG: Thread 11 > (opensmppbox.c:function) maps to pid 20141. > 2013-11-08 23:18:10 [20141] [11] INFO: Client connected from > <XXX.XXX.XXX.XXX> > > Note: > (C.1.1) The last submit_sm_resp is being sent for the 'sequence_number: > 38101' > (C.1.2) Reading the PDU with sequence_number: 38102 is what causes the > ERROR and the subsequent termination of the 'smpp_to_bearerbox' > 'Thread 10' (Refer to notes under section (C.2) below) > (C.1.3) The ESME gets disconnected and initiates a new connection to > OPENSMPPBOX. > > > (C.2) Relevant excerpts from the captured packets analyzed: > > The Reassembled submit_sm PDU with 'sequence_number: 38102' > > 0000 00 00 00 38 00 00 00 04 00 00 00 00 00 00 94 d6 ...8.... > ........ > 0010 00 05 00 58 58 2d 4b 41 4e 4e 45 4c 00 00 01 31 ...XX-KA > NNEL...1 > 0020 32 33 34 35 36 37 38 39 30 00 03 00 00 00 00 01 23456789 > 0....... > 0030 00 00 00 04 54 45 53 54 ....TEST > > > Note: > (C.2.1) The 56 byte submit_sm PDU with 'sequence_number: 38102' > arrives in 2 parts - 40 bytes as the trailing part of the 1st > segment's payload and 16 bytes in the leading part of the second segment's payload. > (C.2.2) The delay in arrival of the 2nd segment makes PDU-data > unavailable to be read in its entirety. The original 'command_length' > field 0x00000038 is discarded. > (C.2.3) The 2nd four octets 0x00000004 (which is in fact the original > 'command_id' field) is now read in as the PDU-length causing the error. > > > P.S: This message is being posted to the KANNEL developers' mailing > list directly because the registration process at redmine.kannel.org > failed to sent a confirmation email. > > Thanks and Best Regards > > Juno Lopez > Director - R&D > > APRICOT > Trivandrum, India > http://apricot.net.in > > <patch_wrt_75.txt>
