Hi once again,

Realized that I forgot a few things when I split the patch up.

In the original unified patch I used charset_convert to convert between charsets in smsbox not octstr_recode. I believe that my reason for doing this was that octstr_recode would translate € (euro) into an XML escape code (&#XXXX;) instead of \200 which is euro in the CP1252 charset.
So I've attached a revised version of the charset patch.


Another thing I totally forgot that I had fixed, was that alphanumeric originators was not converted into the GSM charset. kannel.smpp.alpha_gsm.diff does that. Unfortunately this adds some problems when using @ as this will terminate the source_addr field within the SUBMIT_SM field. Of course, when setting alt_charset, that charset will be used instead.


Med venlig hilsen / Best regards

Peter Christensen

Developer
------------------
Cool Systems ApS

Tel: +45 2888 1600
 @ : [EMAIL PROTECTED]
www: www.coolsystems.dk


Peter Christensen wrote:
Hi again,

Here's the lot split into logical pieces (and synchronized with todays CVS).


kannel.mysql.escape.diff:
 * Escapes strings in dlr_mysql_add INSERT query.


kannel.smpp.cp1252.diff:
 * Changed internal non-unicode character set to CP1252 in smsbox and SMPP


kannel.smpp.errors.diff:
 * Added recognition of all possible SMPP errors (including SMPPv5)


kannel.mblox.errors.diff:
 * Added recognition of all mBlox SMPP errors


kannel.smpp.pack_udh.diff
* Added pack_udh option to SMPP. When set, Message is packed with the UDH into binary data when sent.


Everything but the pack_udh patch have been thoroughly tested with millions of messages for seveal months. pack_udh have only been tested with some 100-200 different messages.


Med venlig hilsen / Best regards

Peter Christensen

Developer
------------------
Cool Systems ApS

Tel: +45 2888 1600
 @ : [EMAIL PROTECTED]
www: www.coolsystems.dk


Stipe Tolj wrote:
Peter Christensen wrote:

Hi,

I'd like to address a couple things now that a new kannel release near:


1. Some while ago, I reported problems within the dlr_mysql_add function. If the entry->timestamp, entry->source, or entry->url contains some unfortunate characters (most significantly <'>), the SQL query gets broken and the DLRs are wasted. After a while, the first patch was submitted, but as it used mysql_real_escape_string, it would potentially require an additional MySQL connection (or something - don't remember what the exact problem was), so it was not committed, and another patch was promised in a near future. Apparently this patch never came, however, and I see that the current CVS is still not escaping the strings.

correct, I have that escaping version of mysql here and it's scheduled for commit to 1.5.0 devel, since I won't have the time to test it that extensively to ensure stability for 1.4.1 stable.

Any thoughts on this?

I could post the patch and let the list confirm via votes that it should or should not go to 1.4.1 stable?

So, question is: What is the cause of the delay, and are you interested in my own patch, which uses mysql_escape_string instead of mysql_real_escape_string?

The cuase is lack in tme... at least from my side... I'm pushing, but I have several "construction sides" at Kannel side, including the more "important" WTP-SAR issue.

If you have ready code for that post to the mailing list. I will do the same, so we can leaverage work time.

I admit that problems related to this is somewhat rare (most likely cause for errors would be an SMS with <'> in the originator string), but since it can be fixed relatively easy, I see no reason why NOT to do it. Although mysql_escape_string does not look at the character set (unlike mysql_real_escape_string), I still believe that it will be better to use it than to do nothing.

absolutely agree'ing... following the main strategy: stability and safe.

2. I've noticed that the charset_gsm_to_latin1 and charset_latin1_to_gsm functions actually uses windows-1252 as character set, but at least the SMPP gateway uses iso-8859-1 internally, which practically removes support for the € (euro) sign. This is probably also a problem within other gateways. If interesseted, I can supply a patch.

yep... Please comment the patch in detail, so that anyone of us who is reviewing has an easier job.

But in relation to that, there is one thing which have begun to annoy me. By restricting to the windows-1252 character set when transmitting with the GSM character set, you remove support for the 10 Greek characters which is support by GSM. And as it always is when something is potentially possible, there will always be some large annoying customer who wants support for that particular feature. This last bit is only an observation and nothing more. I realize that fixing this (preferable by using UTF-8 or UCS-2BE as internal character set regardless of the output character set), would require a significantly amount of recoding within almost every part of the kannel software package.

yep.

The reason while I have not attached any patches to this mail (although I have made quite a few changes), is that my own patch is really a combination of several patches, which is not in sync with the CVS. But if people are interested, I will update the patch and post the relevant bits.

yes, please try to keep clean logical patchsets. Which means one patchset = one logical change. It's too hard to trace several issues in just one single patchset.

The whole patch does the following:
 SMPP:
* Add support for mBlox operator and billing identifier (Not my own work) * Add support for ALL SMPPv5 error codes, including mBlox specific codes. (That is, recognize them and translate to human readable text)
  * Use CP1252 instead of ISO-8859-1 as internal charset
* Added pack_udh parameter. When set, messages are sent as packed GSM data when UDH is present. (A few gateways require this)

 MySQL:
  * Escapes strings in dlr_mysql_add

 run_kannel_box:
* Added waitpid after kill to avoid false terminations (the init scripts reports that kannel is terminated, while bearerbox is actually stuck within a connect call. - happens relatively often). With later CVS releases this didn't seem to work, so instead I've added some functionaly to the init script instead.

Stipe


diff -Nru gateway/gw/smsc/smsc_smpp.c gateway.new/gw/smsc/smsc_smpp.c
--- gateway/gw/smsc/smsc_smpp.c	2005-12-18 21:21:16.000000000 +0100
+++ gateway.new/gw/smsc/smsc_smpp.c	2006-01-10 17:20:13.000000000 +0100
@@ -370,6 +370,7 @@
 
         break;
     case GSM_ADDR_TON_ALPHANUMERIC:
+		charset_latin1_to_gsm(addr);
         if (octstr_len(addr) > 11) {
             /* alphanum sender, max. allowed length is 11 (according to GSM specs) */
             error(0, "SMPP[%s]: Mallformed addr `%s', alphanum length greater 11 chars. ",
@@ -747,6 +748,14 @@
             if (!octstr_check_range(pdu->u.submit_sm.source_addr, 1, 256, gw_isdigit)) {
                 pdu->u.submit_sm.source_addr_ton = GSM_ADDR_TON_ALPHANUMERIC; /* alphanum */
                 pdu->u.submit_sm.source_addr_npi = GSM_ADDR_NPI_UNKNOWN;    /* short code */
+				if (smpp->alt_charset) {
+					if (charset_convert(pdu->u.submit_sm.source_addr, SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset)) != 0)
+						error(0, "Failed to convert source_addr from charset <%s> to <%s>, will send as is.",
+								SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
+				}
+				else
+					charset_latin1_to_gsm(pdu->u.submit_sm.source_addr);
+
             } else {
                /* numeric sender address with + in front -> international (remove the +) */
                octstr_delete(pdu->u.submit_sm.source_addr, 0, 1);
@@ -756,6 +765,14 @@
             if (!octstr_check_range(pdu->u.submit_sm.source_addr,0, 256, gw_isdigit)) {
                 pdu->u.submit_sm.source_addr_ton = GSM_ADDR_TON_ALPHANUMERIC;
                 pdu->u.submit_sm.source_addr_npi = GSM_ADDR_NPI_UNKNOWN;
+				if (smpp->alt_charset) {
+					if (charset_convert(pdu->u.submit_sm.source_addr, SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset)) != 0)
+						error(0, "Failed to convert source_addr from charset <%s> to <%s>, will send as is.",
+								SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
+				}
+				else
+					charset_latin1_to_gsm(pdu->u.submit_sm.source_addr);
+
             }
         }
     }
diff -Nru gateway/gw/smsbox.c gateway.new/gw/smsbox.c
--- gateway/gw/smsbox.c	2005-12-09 03:14:31.000000000 +0100
+++ gateway.new/gw/smsbox.c	2006-01-10 15:57:05.000000000 +0100
@@ -3654,9 +3654,9 @@
 
 	if (coding == DC_7BIT) {
 	    /*
-	     * For 7 bit, convert to ISO-8859-1
+	     * For 7 bit, convert to CP1252
 	     */
-	    if (octstr_recode (octstr_imm ("ISO-8859-1"), charset, body) < 0) {
+	    if (charset_convert (body, octstr_get_cstr(charset), "CP1252") < 0) {
 		resultcode = -1;
 	    }
 	} else if (coding == DC_UCS2) {
diff -Nru gateway/gw/smsc/smsc_smpp.c gateway.new/gw/smsc/smsc_smpp.c
--- gateway/gw/smsc/smsc_smpp.c	2005-12-18 21:21:16.000000000 +0100
+++ gateway.new/gw/smsc/smsc_smpp.c	2006-01-10 15:58:41.000000000 +0100
@@ -78,6 +78,8 @@
 #include "sms.h"
 #include "dlr.h"
 
+#define SMPP_CHARSET	"CP1252"
+
 /*
  * Select these based on whether you want to dump SMPP PDUs as they are
  * sent and received or not. Not dumping should be the default in at least
@@ -490,9 +492,9 @@
              * unless it was specified binary, ie. UDH indicator was detected
              */
             if (smpp->alt_charset && msg->sms.coding != DC_8BIT) {
-                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), "ISO-8859-1") != 0)
+                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET) != 0)
                     error(0, "Failed to convert msgdata from charset <%s> to <%s>, will leave as is.",
-                             octstr_get_cstr(smpp->alt_charset), "ISO-8859-1");
+                             octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET);
                 msg->sms.coding = DC_7BIT;
             } else { /* assume GSM 03.38 7-bit alphabet */
                 charset_gsm_to_latin1(msg->sms.msgdata);
@@ -640,9 +642,9 @@
              * unless it was specified binary, ie. UDH indicator was detected
              */
             if (smpp->alt_charset && msg->sms.coding != DC_8BIT) {
-                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), "ISO-8859-1") != 0)
+                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET) != 0)
                     error(0, "Failed to convert msgdata from charset <%s> to <%s>, will leave as is.",
-                             octstr_get_cstr(smpp->alt_charset), "ISO-8859-1");
+                             octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET);
                 msg->sms.coding = DC_7BIT;
             } else { /* assume GSM 03.38 7-bit alphabet */
                 charset_gsm_to_latin1(msg->sms.msgdata);
@@ -845,10 +847,10 @@
             /*
              * convert to the given alternative charset
              */
-            if (charset_convert(pdu->u.submit_sm.short_message, "ISO-8859-1",
+            if (charset_convert(pdu->u.submit_sm.short_message, SMPP_CHARSET,
                                 octstr_get_cstr(smpp->alt_charset)) != 0)
                 error(0, "Failed to convert msgdata from charset <%s> to <%s>, will send as is.",
-                             "ISO-8859-1", octstr_get_cstr(smpp->alt_charset));
+                             SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
         }
     }
 

Reply via email to