Hi once again,
Realized that I forgot a few things when I split the patch up.
In the original unified patch I used charset_convert to convert between
charsets in smsbox not octstr_recode. I believe that my reason for doing
this was that octstr_recode would translate € (euro) into an XML escape
code (&#XXXX;) instead of \200 which is euro in the CP1252 charset.
So I've attached a revised version of the charset patch.
Another thing I totally forgot that I had fixed, was that alphanumeric
originators was not converted into the GSM charset.
kannel.smpp.alpha_gsm.diff does that. Unfortunately this adds some
problems when using @ as this will terminate the source_addr field
within the SUBMIT_SM field. Of course, when setting alt_charset, that
charset will be used instead.
Med venlig hilsen / Best regards
Peter Christensen
Developer
------------------
Cool Systems ApS
Tel: +45 2888 1600
@ : [EMAIL PROTECTED]
www: www.coolsystems.dk
Peter Christensen wrote:
Hi again,
Here's the lot split into logical pieces (and synchronized with todays
CVS).
kannel.mysql.escape.diff:
* Escapes strings in dlr_mysql_add INSERT query.
kannel.smpp.cp1252.diff:
* Changed internal non-unicode character set to CP1252 in smsbox and SMPP
kannel.smpp.errors.diff:
* Added recognition of all possible SMPP errors (including SMPPv5)
kannel.mblox.errors.diff:
* Added recognition of all mBlox SMPP errors
kannel.smpp.pack_udh.diff
* Added pack_udh option to SMPP. When set, Message is packed with the
UDH into binary data when sent.
Everything but the pack_udh patch have been thoroughly tested with
millions of messages for seveal months. pack_udh have only been tested
with some 100-200 different messages.
Med venlig hilsen / Best regards
Peter Christensen
Developer
------------------
Cool Systems ApS
Tel: +45 2888 1600
@ : [EMAIL PROTECTED]
www: www.coolsystems.dk
Stipe Tolj wrote:
Peter Christensen wrote:
Hi,
I'd like to address a couple things now that a new kannel release near:
1. Some while ago, I reported problems within the dlr_mysql_add
function. If the entry->timestamp, entry->source, or entry->url
contains some unfortunate characters (most significantly <'>), the
SQL query gets broken and the DLRs are wasted.
After a while, the first patch was submitted, but as it used
mysql_real_escape_string, it would potentially require an additional
MySQL connection (or something - don't remember what the exact
problem was), so it was not committed, and another patch was promised
in a near future. Apparently this patch never came, however, and I
see that the current CVS is still not escaping the strings.
correct, I have that escaping version of mysql here and it's scheduled
for commit to 1.5.0 devel, since I won't have the time to test it that
extensively to ensure stability for 1.4.1 stable.
Any thoughts on this?
I could post the patch and let the list confirm via votes that it
should or should not go to 1.4.1 stable?
So, question is: What is the cause of the delay, and are you
interested in my own patch, which uses mysql_escape_string instead of
mysql_real_escape_string?
The cuase is lack in tme... at least from my side... I'm pushing, but
I have several "construction sides" at Kannel side, including the more
"important" WTP-SAR issue.
If you have ready code for that post to the mailing list. I will do
the same, so we can leaverage work time.
I admit that problems related to this is somewhat rare (most likely
cause for errors would be an SMS with <'> in the originator string),
but since it can be fixed relatively easy, I see no reason why NOT to
do it. Although mysql_escape_string does not look at the character
set (unlike mysql_real_escape_string), I still believe that it will
be better to use it than to do nothing.
absolutely agree'ing... following the main strategy: stability and safe.
2. I've noticed that the charset_gsm_to_latin1 and
charset_latin1_to_gsm functions actually uses windows-1252 as
character set, but at least the SMPP gateway uses iso-8859-1
internally, which practically removes support for the € (euro) sign.
This is probably also a problem within other gateways. If
interesseted, I can supply a patch.
yep... Please comment the patch in detail, so that anyone of us who is
reviewing has an easier job.
But in relation to that, there is one thing which have begun to annoy
me. By restricting to the windows-1252 character set when
transmitting with the GSM character set, you remove support for the
10 Greek characters which is support by GSM. And as it always is when
something is potentially possible, there will always be some large
annoying customer who wants support for that particular feature.
This last bit is only an observation and nothing more. I realize that
fixing this (preferable by using UTF-8 or UCS-2BE as internal
character set regardless of the output character set), would require
a significantly amount of recoding within almost every part of the
kannel software package.
yep.
The reason while I have not attached any patches to this mail
(although I have made quite a few changes), is that my own patch is
really a combination of several patches, which is not in sync with
the CVS. But if people are interested, I will update the patch and
post the relevant bits.
yes, please try to keep clean logical patchsets. Which means one
patchset = one logical change. It's too hard to trace several issues
in just one single patchset.
The whole patch does the following:
SMPP:
* Add support for mBlox operator and billing identifier (Not my own
work)
* Add support for ALL SMPPv5 error codes, including mBlox specific
codes. (That is, recognize them and translate to human readable text)
* Use CP1252 instead of ISO-8859-1 as internal charset
* Added pack_udh parameter. When set, messages are sent as packed
GSM data when UDH is present. (A few gateways require this)
MySQL:
* Escapes strings in dlr_mysql_add
run_kannel_box:
* Added waitpid after kill to avoid false terminations (the init
scripts reports that kannel is terminated, while bearerbox is
actually stuck within a connect call. - happens relatively often).
With later CVS releases this didn't seem to work, so instead I've
added some functionaly to the init script instead.
Stipe
diff -Nru gateway/gw/smsc/smsc_smpp.c gateway.new/gw/smsc/smsc_smpp.c
--- gateway/gw/smsc/smsc_smpp.c 2005-12-18 21:21:16.000000000 +0100
+++ gateway.new/gw/smsc/smsc_smpp.c 2006-01-10 17:20:13.000000000 +0100
@@ -370,6 +370,7 @@
break;
case GSM_ADDR_TON_ALPHANUMERIC:
+ charset_latin1_to_gsm(addr);
if (octstr_len(addr) > 11) {
/* alphanum sender, max. allowed length is 11 (according to GSM specs) */
error(0, "SMPP[%s]: Mallformed addr `%s', alphanum length greater 11 chars. ",
@@ -747,6 +748,14 @@
if (!octstr_check_range(pdu->u.submit_sm.source_addr, 1, 256, gw_isdigit)) {
pdu->u.submit_sm.source_addr_ton = GSM_ADDR_TON_ALPHANUMERIC; /* alphanum */
pdu->u.submit_sm.source_addr_npi = GSM_ADDR_NPI_UNKNOWN; /* short code */
+ if (smpp->alt_charset) {
+ if (charset_convert(pdu->u.submit_sm.source_addr, SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset)) != 0)
+ error(0, "Failed to convert source_addr from charset <%s> to <%s>, will send as is.",
+ SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
+ }
+ else
+ charset_latin1_to_gsm(pdu->u.submit_sm.source_addr);
+
} else {
/* numeric sender address with + in front -> international (remove the +) */
octstr_delete(pdu->u.submit_sm.source_addr, 0, 1);
@@ -756,6 +765,14 @@
if (!octstr_check_range(pdu->u.submit_sm.source_addr,0, 256, gw_isdigit)) {
pdu->u.submit_sm.source_addr_ton = GSM_ADDR_TON_ALPHANUMERIC;
pdu->u.submit_sm.source_addr_npi = GSM_ADDR_NPI_UNKNOWN;
+ if (smpp->alt_charset) {
+ if (charset_convert(pdu->u.submit_sm.source_addr, SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset)) != 0)
+ error(0, "Failed to convert source_addr from charset <%s> to <%s>, will send as is.",
+ SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
+ }
+ else
+ charset_latin1_to_gsm(pdu->u.submit_sm.source_addr);
+
}
}
}
diff -Nru gateway/gw/smsbox.c gateway.new/gw/smsbox.c
--- gateway/gw/smsbox.c 2005-12-09 03:14:31.000000000 +0100
+++ gateway.new/gw/smsbox.c 2006-01-10 15:57:05.000000000 +0100
@@ -3654,9 +3654,9 @@
if (coding == DC_7BIT) {
/*
- * For 7 bit, convert to ISO-8859-1
+ * For 7 bit, convert to CP1252
*/
- if (octstr_recode (octstr_imm ("ISO-8859-1"), charset, body) < 0) {
+ if (charset_convert (body, octstr_get_cstr(charset), "CP1252") < 0) {
resultcode = -1;
}
} else if (coding == DC_UCS2) {
diff -Nru gateway/gw/smsc/smsc_smpp.c gateway.new/gw/smsc/smsc_smpp.c
--- gateway/gw/smsc/smsc_smpp.c 2005-12-18 21:21:16.000000000 +0100
+++ gateway.new/gw/smsc/smsc_smpp.c 2006-01-10 15:58:41.000000000 +0100
@@ -78,6 +78,8 @@
#include "sms.h"
#include "dlr.h"
+#define SMPP_CHARSET "CP1252"
+
/*
* Select these based on whether you want to dump SMPP PDUs as they are
* sent and received or not. Not dumping should be the default in at least
@@ -490,9 +492,9 @@
* unless it was specified binary, ie. UDH indicator was detected
*/
if (smpp->alt_charset && msg->sms.coding != DC_8BIT) {
- if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), "ISO-8859-1") != 0)
+ if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET) != 0)
error(0, "Failed to convert msgdata from charset <%s> to <%s>, will leave as is.",
- octstr_get_cstr(smpp->alt_charset), "ISO-8859-1");
+ octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET);
msg->sms.coding = DC_7BIT;
} else { /* assume GSM 03.38 7-bit alphabet */
charset_gsm_to_latin1(msg->sms.msgdata);
@@ -640,9 +642,9 @@
* unless it was specified binary, ie. UDH indicator was detected
*/
if (smpp->alt_charset && msg->sms.coding != DC_8BIT) {
- if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), "ISO-8859-1") != 0)
+ if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET) != 0)
error(0, "Failed to convert msgdata from charset <%s> to <%s>, will leave as is.",
- octstr_get_cstr(smpp->alt_charset), "ISO-8859-1");
+ octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET);
msg->sms.coding = DC_7BIT;
} else { /* assume GSM 03.38 7-bit alphabet */
charset_gsm_to_latin1(msg->sms.msgdata);
@@ -845,10 +847,10 @@
/*
* convert to the given alternative charset
*/
- if (charset_convert(pdu->u.submit_sm.short_message, "ISO-8859-1",
+ if (charset_convert(pdu->u.submit_sm.short_message, SMPP_CHARSET,
octstr_get_cstr(smpp->alt_charset)) != 0)
error(0, "Failed to convert msgdata from charset <%s> to <%s>, will send as is.",
- "ISO-8859-1", octstr_get_cstr(smpp->alt_charset));
+ SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
}
}