You re the expert ;)
talking about gsm03.38 characters, is this charset valid:
http://www.dreamfabric.com/sms/default_alphabet.html ?
the problem is that (for the same example):
INFO: sendsms sender:<tester:4477> (127.0.0.1) to:<64804243> msg:<This
is á mèssage with áccents and làtín characteres like this: ñ -end of
message.>
I'm sending this to smsc:
...
DEBUG: data_coding: 0 = 0x00000000
DEBUG: sm_default_msg_id: 0 = 0x00000000
DEBUG: sm_length: 85 = 0x00000055
DEBUG: short_message:
DEBUG: Octet string at 0x819eb50:
DEBUG: len: 85
DEBUG: size: 86
DEBUG: immutable: 0
DEBUG: data: 54 68 69 73 20 69 73 20 3f 20 6d c3 a8 73 73 61
This is ? m..ssa
DEBUG: data: 67 65 20 77 69 74 68 20 3f 63 63 65 6e 74 73 20 ge
with ?ccents
DEBUG: data: 61 6e 64 20 6c c3 a0 74 3f 6e 20 63 68 61 72 61
and l..t?n chara
DEBUG: data: 63 74 65 72 65 73 20 6c 69 6b 65 20 74 68 69 73
cteres like this
DEBUG: data: 3a 20 c3 b1 20 2d 65 6e 64 20 6f 66 20 6d 65 73 :
.. -end of mes
DEBUG: data: 73 61 67 65 2e sage.
DEBUG: Octet string dump ends.
DEBUG: SMPP PDU dump ends.
according to the document, the characters á and í can't be represented
in gsm, but is it normal to have c3 a8 for è (04 according to
document), c3 a0 for à (7f according to document) and c3 b1 for ñ (7d
according to document) ?
If the table is wrong, where can I get the correct one?
I also uncommented the debug lines on function charset_processing at
the end of smsbox.c, here's the result:
DEBUG: enter charset, coding=0, msgdata is This is á mèssage with
áccents and làtín characteres like this: ñ -end of message.
DEBUG: Octet string at 0x8160a30:
DEBUG: len: 82
DEBUG: size: 83
DEBUG: immutable: 0
DEBUG: data: 54 68 69 73 20 69 73 20 e1 20 6d e8 73 73 61 67 This
is . m.ssag
DEBUG: data: 65 20 77 69 74 68 20 e1 63 63 65 6e 74 73 20 61 e
with .ccents a
DEBUG: data: 6e 64 20 6c e0 74 ed 6e 20 63 68 61 72 61 63 74 nd
l.t.n charact
DEBUG: data: 65 72 65 73 20 6c 69 6b 65 20 74 68 69 73 3a 20 eres like this:
DEBUG: data: f1 20 2d 65 6e 64 20 6f 66 20 6d 65 73 73 61 67 .
-end of messag
DEBUG: data: 65 2e e.
DEBUG: Octet string dump ends.
DEBUG: exit charset, coding=0, msgdata is This is á mèssage with
áccents and là tÃn characteres like this: ñ -end of message.
DEBUG: Octet string at 0x8160a30:
DEBUG: len: 88
DEBUG: size: 1024
DEBUG: immutable: 0
DEBUG: data: 54 68 69 73 20 69 73 20 c3 a1 20 6d c3 a8 73 73 This
is .. m..ss
DEBUG: data: 61 67 65 20 77 69 74 68 20 c3 a1 63 63 65 6e 74 age
with ..ccent
DEBUG: data: 73 20 61 6e 64 20 6c c3 a0 74 c3 ad 6e 20 63 68 s and
l..t..n ch
DEBUG: data: 61 72 61 63 74 65 72 65 73 20 6c 69 6b 65 20 74
aracteres like t
DEBUG: data: 68 69 73 3a 20 c3 b1 20 2d 65 6e 64 20 6f 66 20 his: .. -end of
DEBUG: data: 6d 65 73 73 61 67 65 2e message.
DEBUG: Octet string dump ends.
on gw/smsc/smsc_smpp.c line 869 we have:
if (msg->sms.coding == DC_7BIT || (msg->sms.coding == DC_UNDEF &&
octstr_len(msg->sms.udhdata))) {
/*
* consider 3 cases:
* a) data_coding 0xFX: encoding should always be GSM 03.38 charset
* b) data_coding 0x00: encoding may be converted according
to alt-charset
* c) data_coding 0x00: assume GSM 03.38 charset if
alt-charset is not defined
*/
if ((pdu->u.submit_sm.data_coding & 0xF0) ||
(pdu->u.submit_sm.data_coding == 0 && !smpp->alt_charset)) {
if (msg->sms.charset != NULL &&
octstr_case_compare(msg->sms.charset,
octstr_imm("GSM-03.38")) != 0)
charset_utf8_to_gsm(pdu->u.submit_sm.short_message);
}
else if (pdu->u.submit_sm.data_coding == 0 && smpp->alt_charset) {
/*
* convert to the given alternative charset
*/
if (msg->sms.charset != NULL &&
octstr_case_compare(msg->sms.charset,
octstr_imm("GSM-03.38")) == 0)
charset_gsm_to_utf8(pdu->u.submit_sm.short_message);
if (charset_convert(pdu->u.submit_sm.short_message,
SMPP_DEFAULT_CHARSET,
octstr_get_cstr(smpp->alt_charset)) != 0)
error(0, "Failed to convert msgdata from charset <%s>
to <%s>, will send as is.",
SMPP_DEFAULT_CHARSET,
octstr_get_cstr(smpp->alt_charset));
}
}
charset_utf8_to_gsm() is only called when msg->sms.charset is set, I'm
echoing it in the just after the if and it's null, in gw/smsbox.c
function smsbox_req_handle the charset is used around line 2281 for
conversion, but it's never assigned to the created message.
should the smsc_smpp if be modified:
if ((pdu->u.submit_sm.data_coding & 0xF0) ||
(pdu->u.submit_sm.data_coding == 0 && !smpp->alt_charset)) {
if (msg->sms.charset != NULL &&
octstr_case_compare(msg->sms.charset,
octstr_imm("GSM-03.38")) != 0)
charset_utf8_to_gsm(pdu->u.submit_sm.short_message);
}
since also if msg->sms.charset is set to ISO-8859-1 (or anything
different than GSM-03.38) charset_utf8_to_gsm will be called...
On 5/18/07, Alexander Malysh <[EMAIL PROTECTED]> wrote:
Paco wrote:
> Thanks Alex,
>
> It works correctly now...
>
> Just one question: if you have an smsc capable of accepting other
> encodings (alt-charset) the utf-8 that we'll have internally for
> conversion to that alt charset (via iconv) will be with gsm03.38
> characters only... so we will still lose characters...
>
> Am I correct?
yes, you are correct but even if operator accepts utf-8 it doesn't mean
that they will send any non gsm03.38 chars on SS7 (I assume GSM network
here).
>
> On 5/18/07, Alexander Malysh <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> great example. attached patch fix this issue. Note: This bug only shown
>> when you sent chars that have no match in GSM03.38 and will be replaced
>> with '?' by kannel.
>>
>> Paco wrote:
>>
>> > Yes, for example, this is the text i'm sending in via http:
>> >
>> > $cat example-text
>> > This is á mèssage with àccents and làtín characteres like this: ñ -end
>> > of message.
>> > $ file example-text
>> > example-text: ISO-8859 text
>> >
>> > in the smsbox log i got:
>> > DEBUG: HTTP: Creating HTTPClient for `127.0.0.1'.
>> > DEBUG: HTTP: Created HTTPClient area 0x815e468.
>> > INFO: smsbox: Got HTTP request </cgi-bin> from <127.0.0.1>
>> > INFO: sendsms used by <tester>
>> > INFO: sendsms sender:<tester:4477> (127.0.0.1) to:<64804243> msg:<This
>> > is á mèssage with àccents and làtín characteres like this: ñ -end of
>> > message.>
>> > DEBUG: Stored UUID ed0cd995-5401-400f-bc2f-3163778d2daf
>> > DEBUG: message length 88, sending 1 messages
>> > DEBUG: Status: 202 Answer: <Sent.>
>> > DEBUG: Delayed reply - wait for bearerbox
>> > DEBUG: Got ACK (0) of ed0cd995-5401-400f-bc2f-3163778d2daf
>> > DEBUG: HTTP: Destroying HTTPClient area 0x815e468.
>> > DEBUG: HTTP: Destroying HTTPClient for `127.0.0.1'.
>> >
>> > nothing more on bearerbox log:
>> > DEBUG: boxc_receiver: sms received
>> > DEBUG: send_msg: sending msg to box: <127.0.0.1>
>> >
>> > and in the smsc logs:
>> > DEBUG: SMPP[test]: Sending PDU:
>> > DEBUG: SMPP PDU 0x819e488 dump:
>> > DEBUG: type_name: submit_sm
>> > DEBUG: command_id: 4 = 0x00000004
>> > DEBUG: command_status: 0 = 0x00000000
>> > DEBUG: sequence_number: 18 = 0x00000012
>> > DEBUG: service_type: NULL
>> > DEBUG: source_addr_ton: 2 = 0x00000002
>> > DEBUG: source_addr_npi: 1 = 0x00000001
>> > DEBUG: source_addr: "4477"
>> > DEBUG: dest_addr_ton: 2 = 0x00000002
>> > DEBUG: dest_addr_npi: 1 = 0x00000001
>> > DEBUG: destination_addr: "64804243"
>> > DEBUG: esm_class: 3 = 0x00000003
>> > DEBUG: protocol_id: 0 = 0x00000000
>> > DEBUG: priority_flag: 0 = 0x00000000
>> > DEBUG: schedule_delivery_time: NULL
>> > DEBUG: validity_period: NULL
>> > DEBUG: registered_delivery: 0 = 0x00000000
>> > DEBUG: replace_if_present_flag: 0 = 0x00000000
>> > DEBUG: data_coding: 0 = 0x00000000
>> > DEBUG: sm_default_msg_id: 0 = 0x00000000
>> > DEBUG: sm_length: 86 = 0x00000056
>> > DEBUG: short_message:
>> > DEBUG: Octet string at 0x819e170:
>> > DEBUG: len: 86
>> > DEBUG: size: 87
>> > DEBUG: immutable: 0
>> > DEBUG: data: 54 68 69 73 20 69 73 20 c3 a1 20 6d c3 a8 73 73
>> > This is .. m..ss
>> > DEBUG: data: 61 67 65 20 77 69 74 68 20 c3 a0 63 63 65 6e 74
>> > age with ..ccent
>> > DEBUG: data: 73 20 61 6e 64 20 6c c3 a0 74 c3 ad 6e 20 63 68 s
>> > and l..t..n ch
>> > DEBUG: data: 61 72 61 63 74 65 72 65 73 20 6c 69 6b 65 20 74
>> > aracteres like t
>> > DEBUG: data: 68 69 73 3a 20 c3 b1 20 2d 65 6e 64 20 6f 66 20
>> > his: .. -end of
>> > DEBUG: data: 6d 65 73 73 61 67
>> > messag
>> > DEBUG: Octet string dump ends.
>> > DEBUG: SMPP PDU dump ends.
>> >
>> > the encoding of my console is ISO-8859-1 and the dump of the http
>> > request is:
>> >
>> > GET
>> >
>>
/cgi-bin?username=tester&password=foobar&from=4477&to=64804243&text=This+is+%E1+m%E8ssage+with+%E1ccents+and+l%E0t%EDn+characteres+like+this%3A+%F1+-end+of+message.&smsc=test&charset=ISO-8859-1
>> > HTTP/1.0 Connection: close
>> >
>> >
>> > On 5/18/07, Stipe Tolj <[EMAIL PROTECTED]> wrote:
>> >> -----BEGIN PGP SIGNED MESSAGE-----
>> >> Hash: SHA1
>> >>
>> >> Paco wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > I'm testing the lastest cvs version, and it seems to be a problem
>> >> > with the encoding, I'm sending a http request with a message to the
>> >> > smsbox (users-sms), the message is in ISO-8859-1 with charset set,
>> >> > the smsbox parse
>> >> > it to utf-8 correctly, but then just when the smsbox is ready to
>> >> > send the message to the bearerbox, the message got truncated... (for
>> >> > messages with accents and special characters)
>> >> >
>> >> > I'm browsing the code and it seems that there's a problem in
>> >> > extract_msgdata_part_by_coding, correct me if I'm wrong but that
>> >> > functions job is to extract the exact size of the message (since it
>> >> > will vary from utf-8 to gsm) and use it to split the message, it
>> >> > seems that the size before
>> >> > converting it from utf-8 to gsm is different than the size after the
>> >> > conversion from gsm to utf-8, however I haven't test it much...
>> >>
>> >> can you show us a sample of the bug, so we can reproduce?
>> >>
>> >> Stipe
>> >>
>> >> - -------------------------------------------------------------------
>> >> Kölner Landstrasse 419
>> >> 40589 Düsseldorf, NRW, Germany
>> >>
>> >> tolj.org system architecture Kannel Software Foundation (KSF)
>> >> http://www.tolj.org/ http://www.kannel.org/
>> >>
>> >> mailto:st_{at}_tolj.org mailto:stolj_{at}_kannel.org
>> >> - -------------------------------------------------------------------
>> >> -----BEGIN PGP SIGNATURE-----
>> >> Version: GnuPG v1.4.7 (MingW32)
>> >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>> >>
>> >> iD8DBQFGTZLo9ez0oeKvYs0RAtX6AJ91xBk6tfzJPcTBaie/GgVS//dBqQCePXDw
>> >> 5meQLvmiJhWOxoeGbrL3ry0=
>> >> =cg26
>> >> -----END PGP SIGNATURE-----
>> >>
>>
>> --
>> Thanks,
>> Alex
>>
--
Thanks,
Alex