Re: Kannel meta-data (TLV) branch mangles MTs with carriage return characters received via HTTP response to MO?

Giulio Harding Wed, 23 Jan 2008 06:16:55 -0800

I think this is the HTTP service involved (we have a few):


group = sms-service
name = kannel_gw
keyword = default

#get-url = http://10.110.123.101:8404/?msgData=%a&sourceAddr=%p&channel=%i&destinationAddr=%Pget-url = http://localhost/kannel?msgData=%a&sourceAddr=%p&channel=%i&destinationAddr=%P&tlv=%D

max-messages = 1
catch-all = true
concatenation = false
split-chars = " "
omit-empty = true

This is a configuration that I inherited many years back - I didn'tknow what the split-chars parameter was for until now, though I stilldon't fully understand how it's used.


On 24/01/2008, at 12:25 AM, Alexander Malysh wrote:

Hi,

did you define 'split-chars' in your config and if so please post ithere,

so I can test it.

Giulio Harding wrote:

I'm kind of stumped with this problem, so I figured I'd write up a
summary of my investigation so far, in case someone else might have
an idea...

On 23/01/2008, at 12:34 AM, Giulio Harding wrote:

I've actually had to back out of using the meta-data Kannel, as we
spotted what looks to be a problem in the way Kannel handles
carriage return characters (ctrl-M) in synchronous responses to MO
HTTP requests - the carriage returns arrive at smsbox fine, but by
the time they leave the bearerbox in an SMPP PDU, they somehow end
up actually mangling the MT message body (text after the carriage
returns is missing, subsequent text segments overwrite the start of
the message body, etc.). Asynchronous MT HTTP requests aren't
affected...

Not 100% sure whether it's a Kannel problem or something else (site-
specific) - I'll do a bit more investigation tomorrow morning and
get back to you ASAP.



When we tested the patched TLV-enabled branch of Kannel (hereby
referred to as *NEW* Kannel) a few days ago, everything worked fine,
except that some messages coming from one particular application
appeared to be being mangled by Kannel - from our kannel traffic log:

...
2008-01-22 03:10:27 SMS HTTP-request sender:XXX request: 'Azure
hotspot PIN^Mfrom Vodafone^MXXXXXXXX' url: 'http://
10.110.123.101:8404/?

msgData=Go&sourceAddr=XXX&channel=vodafonechmt&destinationAddr=19929873'

 reply: 200 '<< successful >>'
2008-01-22 03:10:28 Sent SMS [SMSC:vodafonechmt] [SVC:kannel_gw]
[ACT:] [BINF:] [from:19929873] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:
23:Azure hotspot PIN.from ] [udh:0:] [id:
7936719e-1c7c-4e90-8161-6f14af4f23f7]
...

So, going in, the message is intact, but going out, it's had a '.'
character inserted, and seems to be truncated.

This wasn't a problem with our original Kannel instance (hereby
referred to as *OLD* Kannel) - Kannel bearerbox version
`cvs-20060727'. Build `Apr 12 2007 17:01:36', compiler `3.4.6
20060404 (Red Hat 3.4.6-3)'. System Linux, release
2.6.9-55.0.2.ELsmp, version #1 SMP Tue Jun 26 14:14:47 EDT 2007,
machine x86_64. Hostname smsgw2.appgw.mnetcorporation.com, IP
10.110.123.31. Libxml version 2.6.16. Using native malloc.

...
2008-01-22 00:43:57 SMS HTTP-request sender:XXX request: 'Azure
hotspot PIN^Mfrom Optus^MXXXXXXXX' url: 'http://10.110.123.101:8404/?
msgData=Go&sourceAddr=XXX&channel=optuschmt&destinationAddr=19929873'
reply: 200 '<< successful >>'
2008-01-22 00:43:57 Sent SMS [SMSC:optuschmt] [SVC:kannel_gw] [ACT:]
[BINF:] [from:19929873] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:37:Azure
hotspot PIN^Mfrom Optus^MXXXXXXXX] [udh:0:] [id:86dfd47e-0106-4700-
ab48-47381c43047c]
...

Also, part of the problem seems to be triggered by the presence of
the carriage return characters (^M) and part seems to be their
presence in the HTTP response to an MO request.

In *OLD* Kannel, a separate MT request with the same text works fine:

...
2008-01-23 23:21:12 send-SMS request added - sender:kannelpush:test
10.110.123.101 target:XXX request: 'Azure hotspot PIN^Mfrom
Optus^MXXXXXXXXX'
2008-01-23 23:21:13 Sent SMS [SMSC:mbloxdirect] [SVC:kannelpush]
[ACT:] [BINF:] [from:test] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:
37:Azure hotspot PIN^Mfrom Optus^MXXXXXXXX] [udh:0:] [id:09ba2907-
f013-46ff-9ec0-de006422d0b4]
...

But in *NEW* Kannel, a separate MT request with the same text still
has problems: it's not truncated any more, but it has '.' characters
instead of carriage returns:

...
2008-01-22 13:49:55 send-SMS request added - sender:kannelpush:
19996737 127.0.0.1 target:XXXX request: 'Azure hotspot PIN^Mfrom
Optus^MXXXXXXXX'
2008-01-22 13:49:56 Sent SMS [SMSC:mbloxpsms] [SVC:kannelpush] [ACT:]
[BINF:] [from:19996737] [to:XXX] [flags:-1:0:-1:-1:-1] [msg:37:Azure
hotspot PIN.from Optus.XXXXXXXX] [udh:0:]
[id:d9308e85-4790-4714-9ac9-48a72159ac31]
...

So, with a few extra debug/trace statements, I was able to narrow
down the mangling as possibly occurring in the
extract_msgdata_part_by_coding function, in gw/sms.c:



static Octstr *extract_msgdata_part_by_coding(Msg *msg, Octstr
*split_chars,
        int max_part_len)
{
    Octstr *temp = NULL, *temp_utf;

    if (msg->sms.coding == DC_8BIT || msg->sms.coding == DC_UCS2) {
        /* nothing to do here, just call the original
extract_msgdata_part */
        return extract_msgdata_part(msg->sms.msgdata, split_chars,
max_part_len);
    }

    /* convert to and the from gsm, so we drop all non GSM chars */
    charset_utf8_to_gsm(msg->sms.msgdata);
    charset_gsm_to_utf8(msg->sms.msgdata);

    /*
     * else we need to do something special. I'll just get
charset_gsm_truncate to
     * cut the string to the required length and then count real
characters.
     */
     temp = octstr_duplicate(msg->sms.msgdata);
     charset_utf8_to_gsm(temp);
     charset_gsm_truncate(temp, max_part_len);

     /* calculate utf-8 length */
     temp_utf = octstr_duplicate(temp);
     charset_gsm_to_utf8(temp_utf);
     max_part_len = octstr_len(temp_utf);

     octstr_destroy(temp);
     octstr_destroy(temp_utf);

     /* now just call the original extract_msgdata_part with the new
length */
     return extract_msgdata_part(msg->sms.msgdata, split_chars,
max_part_len);
}


This has changed noticeably from our *OLD* Kannel:


static Octstr *extract_msgdata_part_by_coding(Msg *msg, Octstr
*split_chars,
        int max_part_len)
{
    Octstr *temp = NULL;
    int pos, esc_count;

    if (msg->sms.coding == DC_8BIT || msg->sms.coding == DC_UCS2) {
        /* nothing to do here, just call the original
extract_msgdata_part */
        return extract_msgdata_part(msg->sms.msgdata, split_chars,
max_part_len);
    }

    /*
     * else we need to do something special. I'll just get
charset_gsm_truncate to
     * cut the string to the required length and then count real
characters.
     */
     temp = octstr_duplicate(msg->sms.msgdata);
     charset_latin1_to_gsm(temp);
     charset_gsm_truncate(temp, max_part_len);

     pos = esc_count = 0;

     while ((pos = octstr_search_char(temp, 27, pos)) != -1) {
        ++pos;
         ++esc_count;
     }

     octstr_destroy(temp);

     /* now just call the original extract_msgdata_part with the new
length */
     return extract_msgdata_part(msg->sms.msgdata, split_chars,
max_part_len - esc_count);
}


The biggest change is the use of UTF-8 as the internal encoding, I
think - however, one part of the mangling, the truncation, appears to
be happening in the extract_msgdata_part function. I jammed in some
extra traces, like so:



static Octstr *extract_msgdata_part(Octstr *msgdata, Octstr
*split_chars,
                                    int max_part_len)
{
debug("sms", 0, "X1: %s", octstr_get_cstr(msgdata));
    long i, len;
    Octstr *part;

debug("sms", 0, "split: %s", octstr_get_cstr(split_chars));
debug("sms", 0, "max_part_len: %i", max_part_len);
    len = max_part_len;
    if (split_chars != NULL) {
debug("sms", 0, "split_chars not null");
        for (i = max_part_len; i > 0; i--) {
debug("sms", 0, "working back from max_part_len: %i", i);
            if (octstr_search_char(split_chars,
                                   octstr_get_char(msgdata, i - 1),
0) != -1) {
debug("sms", 0, "something something, at char %i ->%c<-", i,
octstr_get_char(msgdata, i - 1));
                len = i;
                break;
            }
        }
    }
    part = octstr_copy(msgdata, 0, len);
debug("sms", 0, "len: %i", len);
debug("sms", 0, "X2: %s", octstr_get_cstr(part));
    octstr_delete(msgdata, 0, len);
    return part;
}


And got the following output:


2008-01-23 22:31:34 [17246] [5] DEBUG: X1: Azure hotspot PIN^Mfrom
Vodafone^MXXXXXXXX
2008-01-23 22:31:34 [17246] [5] DEBUG: split:
2008-01-23 22:31:34 [17246] [5] DEBUG: max_part_len: 41
2008-01-23 22:31:34 [17246] [5] DEBUG: split_chars not null
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 41
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 40
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 39
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 38
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 37
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 36
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 35
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 34
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 33
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 32
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 31
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 30
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 29
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 28
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 27
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 26
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 25
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 24
2008-01-23 22:31:34 [17246] [5] DEBUG: working back from
max_part_len: 23
2008-01-23 22:31:34 [17246] [5] DEBUG: something something, at char
23 -> <-
2008-01-23 22:31:34 [17246] [5] DEBUG: len: 23
2008-01-23 22:31:34 [17246] [5] DEBUG: X2: Azure hotspot PIN^Mfrom

2008-01-23 22:31:34 [17246] [5] DEBUG: Extract: Azure hotspotPIN^Mfrom

2008-01-23 22:31:34 [17246] [5] DEBUG: message length 41, sending 1
messages


So, that loop appears to be truncating the message text - but I'm not
sure why. Can someone explain that code? Is that behaviour incorrect?
(It looks like it to me)

As for the carriage returns becoming '.' characters, I haven't
located where that's occurring, but I'm wondering whether that might
be due to the switch to UTF8 internally?

Anyway, like I said, I'm a bit out of my depth here - does anyone
have any ideas? Is anyone else able to replicate this strange
behaviour with carriage returns in the meta_data branch? (or even the
latest CVS snapshot, since this doesn't seem to be TLV-specific...)

Thanks,

--
Giulio Harding
Systems Administrator

m.Net Corporation
Level 2, 8 Leigh Street
Adelaide SA 5000, Australia

Tel: +61 8 8210 2041
Fax: +61 8 8211 9620
Mobile: 0432 876 733
Yahoo: giulio.harding
MSN: [EMAIL PROTECTED]

http://www.mnetcorporation.com


--
Thanks,
Alex


--
Giulio Harding
Systems Administrator

m.Net Corporation
Level 2, 8 Leigh Street
Adelaide SA 5000, Australia

Tel: +61 8 8210 2041
Fax: +61 8 8211 9620
Mobile: 0432 876 733
MSN: [EMAIL PROTECTED]

http://www.mnetcorporation.com

Re: Kannel meta-data (TLV) branch mangles MTs with carriage return characters received via HTTP response to MO?

Reply via email to