[PATCH] Re: Kannel meta-data (TLV) branch mangles MTs with carriage return characters received via HTTP response to MO?

Alexander Malysh Wed, 23 Jan 2008 09:50:47 -0800

Hi,

the fix is pretty simple and bug only occurs if split-chars defined in
config, therefore I was unable to reproduce this bug for sooo long :)


Please try attached patch that fixes it for me.

Giulio Harding wrote:

> I think this is the HTTP service involved (we have a few):
> 
> group = sms-service
> name = kannel_gw
> keyword = default
> #get-url = http://10.110.123.101:8404/?msgData=%a&sourceAddr=
> %p&channel=%i&destinationAddr=%P
> get-url = http://localhost/kannel?msgData=%a&sourceAddr=%p&channel=
> %i&destinationAddr=%P&tlv=%D
> max-messages = 1
> catch-all = true
> concatenation = false
> split-chars = " "
> omit-empty = true
> 
> This is a configuration that I inherited many years back - I didn't
> know what the split-chars parameter was for until now, though I still
> don't fully understand how it's used.
> 
> On 24/01/2008, at 12:25 AM, Alexander Malysh wrote:
> 
>> Hi,
>>
>> did you define 'split-chars' in your config and if so please post it
>> here,
>> so I can test it.
>>
>> Giulio Harding wrote:
>>
>>> I'm kind of stumped with this problem, so I figured I'd write up a
>>> summary of my investigation so far, in case someone else might have
>>> an idea...
>>>
>>> On 23/01/2008, at 12:34 AM, Giulio Harding wrote:
>>>
>>>> I've actually had to back out of using the meta-data Kannel, as we
>>>> spotted what looks to be a problem in the way Kannel handles
>>>> carriage return characters (ctrl-M) in synchronous responses to MO
>>>> HTTP requests - the carriage returns arrive at smsbox fine, but by
>>>> the time they leave the bearerbox in an SMPP PDU, they somehow end
>>>> up actually mangling the MT message body (text after the carriage
>>>> returns is missing, subsequent text segments overwrite the start of
>>>> the message body, etc.). Asynchronous MT HTTP requests aren't
>>>> affected...
>>>>
>>>> Not 100% sure whether it's a Kannel problem or something else (site-
>>>> specific) - I'll do a bit more investigation tomorrow morning and
>>>> get back to you ASAP.
>>>
>>>
>>> When we tested the patched TLV-enabled branch of Kannel (hereby
>>> referred to as *NEW* Kannel) a few days ago, everything worked fine,
>>> except that some messages coming from one particular application
>>> appeared to be being mangled by Kannel - from our kannel traffic log:
>>>
>>> ...
>>> 2008-01-22 03:10:27 SMS HTTP-request sender:XXX request: 'Azure
>>> hotspot PIN^Mfrom Vodafone^MXXXXXXXX' url: 'http://
>>> 10.110.123.101:8404/?
>>> msgData
>>> =Go&sourceAddr=XXX&channel=vodafonechmt&destinationAddr=19929873'
>>>  reply: 200 '<< successful >>'
>>> 2008-01-22 03:10:28 Sent SMS [SMSC:vodafonechmt] [SVC:kannel_gw]
>>> [ACT:] [BINF:] [from:19929873] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:
>>> 23:Azure hotspot PIN.from ] [udh:0:] [id:
>>> 7936719e-1c7c-4e90-8161-6f14af4f23f7]
>>> ...
>>>
>>> So, going in, the message is intact, but going out, it's had a '.'
>>> character inserted, and seems to be truncated.
>>>
>>> This wasn't a problem with our original Kannel instance (hereby
>>> referred to as *OLD* Kannel) - Kannel bearerbox version
>>> `cvs-20060727'. Build `Apr 12 2007 17:01:36', compiler `3.4.6
>>> 20060404 (Red Hat 3.4.6-3)'. System Linux, release
>>> 2.6.9-55.0.2.ELsmp, version #1 SMP Tue Jun 26 14:14:47 EDT 2007,
>>> machine x86_64. Hostname smsgw2.appgw.mnetcorporation.com, IP
>>> 10.110.123.31. Libxml version 2.6.16. Using native malloc.
>>>
>>> ...
>>> 2008-01-22 00:43:57 SMS HTTP-request sender:XXX request: 'Azure
>>> hotspot PIN^Mfrom Optus^MXXXXXXXX' url: 'http://10.110.123.101:8404/?
>>> msgData=Go&sourceAddr=XXX&channel=optuschmt&destinationAddr=19929873'
>>> reply: 200 '<< successful >>'
>>> 2008-01-22 00:43:57 Sent SMS [SMSC:optuschmt] [SVC:kannel_gw] [ACT:]
>>> [BINF:] [from:19929873] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:37:Azure
>>> hotspot PIN^Mfrom Optus^MXXXXXXXX] [udh:0:] [id:86dfd47e-0106-4700-
>>> ab48-47381c43047c]
>>> ...
>>>
>>> Also, part of the problem seems to be triggered by the presence of
>>> the carriage return characters (^M) and part seems to be their
>>> presence in the HTTP response to an MO request.
>>>
>>> In *OLD* Kannel, a separate MT request with the same text works fine:
>>>
>>> ...
>>> 2008-01-23 23:21:12 send-SMS request added - sender:kannelpush:test
>>> 10.110.123.101 target:XXX request: 'Azure hotspot PIN^Mfrom
>>> Optus^MXXXXXXXXX'
>>> 2008-01-23 23:21:13 Sent SMS [SMSC:mbloxdirect] [SVC:kannelpush]
>>> [ACT:] [BINF:] [from:test] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:
>>> 37:Azure hotspot PIN^Mfrom Optus^MXXXXXXXX] [udh:0:] [id:09ba2907-
>>> f013-46ff-9ec0-de006422d0b4]
>>> ...
>>>
>>> But in *NEW* Kannel, a separate MT request with the same text still
>>> has problems: it's not truncated any more, but it has '.' characters
>>> instead of carriage returns:
>>>
>>> ...
>>> 2008-01-22 13:49:55 send-SMS request added - sender:kannelpush:
>>> 19996737 127.0.0.1 target:XXXX request: 'Azure hotspot PIN^Mfrom
>>> Optus^MXXXXXXXX'
>>> 2008-01-22 13:49:56 Sent SMS [SMSC:mbloxpsms] [SVC:kannelpush] [ACT:]
>>> [BINF:] [from:19996737] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:37:Azure
>>> hotspot PIN.from Optus.XXXXXXXX] [udh:0:]
>>> [id:d9308e85-4790-4714-9ac9-48a72159ac31]
>>> ...
>>>
>>> So, with a few extra debug/trace statements, I was able to narrow
>>> down the mangling as possibly occurring in the
>>> extract_msgdata_part_by_coding function, in gw/sms.c:
>>>
>>>
>>>
>>> static Octstr *extract_msgdata_part_by_coding(Msg *msg, Octstr
>>> *split_chars,
>>>         int max_part_len)
>>> {
>>>     Octstr *temp = NULL, *temp_utf;
>>>
>>>     if (msg->sms.coding == DC_8BIT || msg->sms.coding == DC_UCS2) {
>>>         /* nothing to do here, just call the original
>>> extract_msgdata_part */
>>>         return extract_msgdata_part(msg->sms.msgdata, split_chars,
>>> max_part_len);
>>>     }
>>>
>>>     /* convert to and the from gsm, so we drop all non GSM chars */
>>>     charset_utf8_to_gsm(msg->sms.msgdata);
>>>     charset_gsm_to_utf8(msg->sms.msgdata);
>>>
>>>     /*
>>>      * else we need to do something special. I'll just get
>>> charset_gsm_truncate to
>>>      * cut the string to the required length and then count real
>>> characters.
>>>      */
>>>      temp = octstr_duplicate(msg->sms.msgdata);
>>>      charset_utf8_to_gsm(temp);
>>>      charset_gsm_truncate(temp, max_part_len);
>>>
>>>      /* calculate utf-8 length */
>>>      temp_utf = octstr_duplicate(temp);
>>>      charset_gsm_to_utf8(temp_utf);
>>>      max_part_len = octstr_len(temp_utf);
>>>
>>>      octstr_destroy(temp);
>>>      octstr_destroy(temp_utf);
>>>
>>>      /* now just call the original extract_msgdata_part with the new
>>> length */
>>>      return extract_msgdata_part(msg->sms.msgdata, split_chars,
>>> max_part_len);
>>> }
>>>
>>>
>>> This has changed noticeably from our *OLD* Kannel:
>>>
>>>
>>> static Octstr *extract_msgdata_part_by_coding(Msg *msg, Octstr
>>> *split_chars,
>>>         int max_part_len)
>>> {
>>>     Octstr *temp = NULL;
>>>     int pos, esc_count;
>>>
>>>     if (msg->sms.coding == DC_8BIT || msg->sms.coding == DC_UCS2) {
>>>         /* nothing to do here, just call the original
>>> extract_msgdata_part */
>>>         return extract_msgdata_part(msg->sms.msgdata, split_chars,
>>> max_part_len);
>>>     }
>>>
>>>     /*
>>>      * else we need to do something special. I'll just get
>>> charset_gsm_truncate to
>>>      * cut the string to the required length and then count real
>>> characters.
>>>      */
>>>      temp = octstr_duplicate(msg->sms.msgdata);
>>>      charset_latin1_to_gsm(temp);
>>>      charset_gsm_truncate(temp, max_part_len);
>>>
>>>      pos = esc_count = 0;
>>>
>>>      while ((pos = octstr_search_char(temp, 27, pos)) != -1) {
>>>         ++pos;
>>>          ++esc_count;
>>>      }
>>>
>>>      octstr_destroy(temp);
>>>
>>>      /* now just call the original extract_msgdata_part with the new
>>> length */
>>>      return extract_msgdata_part(msg->sms.msgdata, split_chars,
>>> max_part_len - esc_count);
>>> }
>>>
>>>
>>> The biggest change is the use of UTF-8 as the internal encoding, I
>>> think - however, one part of the mangling, the truncation, appears to
>>> be happening in the extract_msgdata_part function. I jammed in some
>>> extra traces, like so:
>>>
>>>
>>>
>>> static Octstr *extract_msgdata_part(Octstr *msgdata, Octstr
>>> *split_chars,
>>>                                     int max_part_len)
>>> {
>>> debug("sms", 0, "X1: %s", octstr_get_cstr(msgdata));
>>>     long i, len;
>>>     Octstr *part;
>>>
>>> debug("sms", 0, "split: %s", octstr_get_cstr(split_chars));
>>> debug("sms", 0, "max_part_len: %i", max_part_len);
>>>     len = max_part_len;
>>>     if (split_chars != NULL) {
>>> debug("sms", 0, "split_chars not null");
>>>         for (i = max_part_len; i > 0; i--) {
>>> debug("sms", 0, "working back from max_part_len: %i", i);
>>>             if (octstr_search_char(split_chars,
>>>                                    octstr_get_char(msgdata, i - 1),
>>> 0) != -1) {
>>> debug("sms", 0, "something something, at char %i ->%c<-", i,
>>> octstr_get_char(msgdata, i - 1));
>>>                 len = i;
>>>                 break;
>>>             }
>>>         }
>>>     }
>>>     part = octstr_copy(msgdata, 0, len);
>>> debug("sms", 0, "len: %i", len);
>>> debug("sms", 0, "X2: %s", octstr_get_cstr(part));
>>>     octstr_delete(msgdata, 0, len);
>>>     return part;
>>> }
>>>
>>>
>>> And got the following output:
>>>
>>>
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: X1: Azure hotspot PIN^Mfrom
>>> Vodafone^MXXXXXXXX
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: split:
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: max_part_len: 41
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: split_chars not null
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 41
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 40
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 39
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 38
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 37
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 36
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 35
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 34
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 33
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 32
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 31
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 30
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 29
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 28
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 27
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 26
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 25
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 24
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
>>> max_part_len: 23
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: something something, at char
>>> 23 -> <-
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: len: 23
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: X2: Azure hotspot PIN^Mfrom
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: Extract: Azure hotspot
>>> PIN^Mfrom
>>> 2008-01-23 22:31:34 [17246] [5] DEBUG: message length 41, sending 1
>>> messages
>>>
>>>
>>> So, that loop appears to be truncating the message text - but I'm not
>>> sure why. Can someone explain that code? Is that behaviour incorrect?
>>> (It looks like it to me)
>>>
>>> As for the carriage returns becoming '.' characters, I haven't
>>> located where that's occurring, but I'm wondering whether that might
>>> be due to the switch to UTF8 internally?
>>>
>>> Anyway, like I said, I'm a bit out of my depth here - does anyone
>>> have any ideas? Is anyone else able to replicate this strange
>>> behaviour with carriage returns in the meta_data branch? (or even the
>>> latest CVS snapshot, since this doesn't seem to be TLV-specific...)
>>>
>>> Thanks,
>>>
>>> --
>>> Giulio Harding
>>> Systems Administrator
>>>
>>> m.Net Corporation
>>> Level 2, 8 Leigh Street
>>> Adelaide SA 5000, Australia
>>>
>>> Tel: +61 8 8210 2041
>>> Fax: +61 8 8211 9620
>>> Mobile: 0432 876 733
>>> Yahoo: giulio.harding
>>> MSN: [EMAIL PROTECTED]
>>>
>>> http://www.mnetcorporation.com
>>
>> --
>> Thanks,
>> Alex
>>
>>
> 
> --
> Giulio Harding
> Systems Administrator
> 
> m.Net Corporation
> Level 2, 8 Leigh Street
> Adelaide SA 5000, Australia
> 
> Tel: +61 8 8210 2041
> Fax: +61 8 8211 9620
> Mobile: 0432 876 733
> MSN: [EMAIL PROTECTED]
> 
> http://www.mnetcorporation.com

-- 
Thanks,
Alex

=== gw/sms.c
==================================================================
--- gw/sms.c	(revision 315)
+++ gw/sms.c	(local)
@@ -251,7 +251,7 @@
     Octstr *part;
 
     len = max_part_len;
-    if (split_chars != NULL)
+    if (max_part_len < octstr_len(msgdata) && split_chars != NULL)
 	for (i = max_part_len; i > 0; i--)
 	    if (octstr_search_char(split_chars,
 				   octstr_get_char(msgdata, i - 1), 0) != -1) {

[PATCH] Re: Kannel meta-data (TLV) branch mangles MTs with carriage return characters received via HTTP response to MO?

Reply via email to