Re: dlr_mysql_add and internal charset

Peter Christensen Tue, 10 Jan 2006 08:29:22 -0800

Hi once again,

Realized that I forgot a few things when I split the patch up.

In the original unified patch I used charset_convert to convert betweencharsets in smsbox not octstr_recode. I believe that my reason for doingthis was that octstr_recode would translate € (euro) into an XML escapecode (&#XXXX;) instead of \200 which is euro in the CP1252 charset.

So I've attached a revised version of the charset patch.

Another thing I totally forgot that I had fixed, was that alphanumericoriginators was not converted into the GSM charset.kannel.smpp.alpha_gsm.diff does that. Unfortunately this adds someproblems when using @ as this will terminate the source_addr fieldwithin the SUBMIT_SM field. Of course, when setting alt_charset, thatcharset will be used instead.



Med venlig hilsen / Best regards

Peter Christensen

Developer
------------------
Cool Systems ApS

Tel: +45 2888 1600
 @ : [EMAIL PROTECTED]
www: www.coolsystems.dk


Peter Christensen wrote:

Hi again,
Here's the lot split into logical pieces (and synchronized with todaysCVS).
kannel.mysql.escape.diff:
 * Escapes strings in dlr_mysql_add INSERT query.


kannel.smpp.cp1252.diff:
 * Changed internal non-unicode character set to CP1252 in smsbox and SMPP


kannel.smpp.errors.diff:
 * Added recognition of all possible SMPP errors (including SMPPv5)


kannel.mblox.errors.diff:
 * Added recognition of all mBlox SMPP errors


kannel.smpp.pack_udh.diff
* Added pack_udh option to SMPP. When set, Message is packed with theUDH into binary data when sent.
Everything but the pack_udh patch have been thoroughly tested withmillions of messages for seveal months. pack_udh have only been testedwith some 100-200 different messages.
Med venlig hilsen / Best regards

Peter Christensen

Developer
------------------
Cool Systems ApS

Tel: +45 2888 1600
 @ : [EMAIL PROTECTED]
www: www.coolsystems.dk


Stipe Tolj wrote:
Peter Christensen wrote:
Hi,

I'd like to address a couple things now that a new kannel release near:
1. Some while ago, I reported problems within the dlr_mysql_addfunction. If the entry->timestamp, entry->source, or entry->urlcontains some unfortunate characters (most significantly <'>), theSQL query gets broken and the DLRs are wasted.After a while, the first patch was submitted, but as it usedmysql_real_escape_string, it would potentially require an additionalMySQL connection (or something - don't remember what the exactproblem was), so it was not committed, and another patch was promisedin a near future. Apparently this patch never came, however, and Isee that the current CVS is still not escaping the strings.
correct, I have that escaping version of mysql here and it's scheduledfor commit to 1.5.0 devel, since I won't have the time to test it thatextensively to ensure stability for 1.4.1 stable.
Any thoughts on this?
I could post the patch and let the list confirm via votes that itshould or should not go to 1.4.1 stable?
So, question is: What is the cause of the delay, and are youinterested in my own patch, which uses mysql_escape_string instead ofmysql_real_escape_string?
The cuase is lack in tme... at least from my side... I'm pushing, butI have several "construction sides" at Kannel side, including the more"important" WTP-SAR issue.
If you have ready code for that post to the mailing list. I will dothe same, so we can leaverage work time.
I admit that problems related to this is somewhat rare (most likelycause for errors would be an SMS with <'> in the originator string),but since it can be fixed relatively easy, I see no reason why NOT todo it. Although mysql_escape_string does not look at the characterset (unlike mysql_real_escape_string), I still believe that it willbe better to use it than to do nothing.
absolutely agree'ing... following the main strategy: stability and safe.
2. I've noticed that the charset_gsm_to_latin1 andcharset_latin1_to_gsm functions actually uses windows-1252 ascharacter set, but at least the SMPP gateway uses iso-8859-1internally, which practically removes support for the € (euro) sign.This is probably also a problem within other gateways. Ifinteresseted, I can supply a patch.
yep... Please comment the patch in detail, so that anyone of us who isreviewing has an easier job.
But in relation to that, there is one thing which have begun to annoyme. By restricting to the windows-1252 character set whentransmitting with the GSM character set, you remove support for the10 Greek characters which is support by GSM. And as it always is whensomething is potentially possible, there will always be some largeannoying customer who wants support for that particular feature.This last bit is only an observation and nothing more. I realize thatfixing this (preferable by using UTF-8 or UCS-2BE as internalcharacter set regardless of the output character set), would requirea significantly amount of recoding within almost every part of thekannel software package.
yep.
The reason while I have not attached any patches to this mail(although I have made quite a few changes), is that my own patch isreally a combination of several patches, which is not in sync withthe CVS. But if people are interested, I will update the patch andpost the relevant bits.
yes, please try to keep clean logical patchsets. Which means onepatchset = one logical change. It's too hard to trace several issuesin just one single patchset.
The whole patch does the following:
 SMPP:
* Add support for mBlox operator and billing identifier (Not my ownwork)* Add support for ALL SMPPv5 error codes, including mBlox specificcodes. (That is, recognize them and translate to human readable text)
  * Use CP1252 instead of ISO-8859-1 as internal charset
* Added pack_udh parameter. When set, messages are sent as packedGSM data when UDH is present. (A few gateways require this)
 MySQL:
  * Escapes strings in dlr_mysql_add

 run_kannel_box:
* Added waitpid after kill to avoid false terminations (the initscripts reports that kannel is terminated, while bearerbox isactually stuck within a connect call. - happens relatively often).With later CVS releases this didn't seem to work, so instead I'veadded some functionaly to the init script instead.
Stipe

diff -Nru gateway/gw/smsc/smsc_smpp.c gateway.new/gw/smsc/smsc_smpp.c
--- gateway/gw/smsc/smsc_smpp.c	2005-12-18 21:21:16.000000000 +0100
+++ gateway.new/gw/smsc/smsc_smpp.c	2006-01-10 17:20:13.000000000 +0100
@@ -370,6 +370,7 @@
 
         break;
     case GSM_ADDR_TON_ALPHANUMERIC:
+		charset_latin1_to_gsm(addr);
         if (octstr_len(addr) > 11) {
             /* alphanum sender, max. allowed length is 11 (according to GSM specs) */
             error(0, "SMPP[%s]: Mallformed addr `%s', alphanum length greater 11 chars. ",
@@ -747,6 +748,14 @@
             if (!octstr_check_range(pdu->u.submit_sm.source_addr, 1, 256, gw_isdigit)) {
                 pdu->u.submit_sm.source_addr_ton = GSM_ADDR_TON_ALPHANUMERIC; /* alphanum */
                 pdu->u.submit_sm.source_addr_npi = GSM_ADDR_NPI_UNKNOWN;    /* short code */
+				if (smpp->alt_charset) {
+					if (charset_convert(pdu->u.submit_sm.source_addr, SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset)) != 0)
+						error(0, "Failed to convert source_addr from charset <%s> to <%s>, will send as is.",
+								SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
+				}
+				else
+					charset_latin1_to_gsm(pdu->u.submit_sm.source_addr);
+
             } else {
                /* numeric sender address with + in front -> international (remove the +) */
                octstr_delete(pdu->u.submit_sm.source_addr, 0, 1);
@@ -756,6 +765,14 @@
             if (!octstr_check_range(pdu->u.submit_sm.source_addr,0, 256, gw_isdigit)) {
                 pdu->u.submit_sm.source_addr_ton = GSM_ADDR_TON_ALPHANUMERIC;
                 pdu->u.submit_sm.source_addr_npi = GSM_ADDR_NPI_UNKNOWN;
+				if (smpp->alt_charset) {
+					if (charset_convert(pdu->u.submit_sm.source_addr, SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset)) != 0)
+						error(0, "Failed to convert source_addr from charset <%s> to <%s>, will send as is.",
+								SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
+				}
+				else
+					charset_latin1_to_gsm(pdu->u.submit_sm.source_addr);
+
             }
         }
     }

diff -Nru gateway/gw/smsbox.c gateway.new/gw/smsbox.c
--- gateway/gw/smsbox.c	2005-12-09 03:14:31.000000000 +0100
+++ gateway.new/gw/smsbox.c	2006-01-10 15:57:05.000000000 +0100
@@ -3654,9 +3654,9 @@
 
 	if (coding == DC_7BIT) {
 	    /*
-	     * For 7 bit, convert to ISO-8859-1
+	     * For 7 bit, convert to CP1252
 	     */
-	    if (octstr_recode (octstr_imm ("ISO-8859-1"), charset, body) < 0) {
+	    if (charset_convert (body, octstr_get_cstr(charset), "CP1252") < 0) {
 		resultcode = -1;
 	    }
 	} else if (coding == DC_UCS2) {
diff -Nru gateway/gw/smsc/smsc_smpp.c gateway.new/gw/smsc/smsc_smpp.c
--- gateway/gw/smsc/smsc_smpp.c	2005-12-18 21:21:16.000000000 +0100
+++ gateway.new/gw/smsc/smsc_smpp.c	2006-01-10 15:58:41.000000000 +0100
@@ -78,6 +78,8 @@
 #include "sms.h"
 #include "dlr.h"
 
+#define SMPP_CHARSET	"CP1252"
+
 /*
  * Select these based on whether you want to dump SMPP PDUs as they are
  * sent and received or not. Not dumping should be the default in at least
@@ -490,9 +492,9 @@
              * unless it was specified binary, ie. UDH indicator was detected
              */
             if (smpp->alt_charset && msg->sms.coding != DC_8BIT) {
-                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), "ISO-8859-1") != 0)
+                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET) != 0)
                     error(0, "Failed to convert msgdata from charset <%s> to <%s>, will leave as is.",
-                             octstr_get_cstr(smpp->alt_charset), "ISO-8859-1");
+                             octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET);
                 msg->sms.coding = DC_7BIT;
             } else { /* assume GSM 03.38 7-bit alphabet */
                 charset_gsm_to_latin1(msg->sms.msgdata);
@@ -640,9 +642,9 @@
              * unless it was specified binary, ie. UDH indicator was detected
              */
             if (smpp->alt_charset && msg->sms.coding != DC_8BIT) {
-                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), "ISO-8859-1") != 0)
+                if (charset_convert(msg->sms.msgdata, octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET) != 0)
                     error(0, "Failed to convert msgdata from charset <%s> to <%s>, will leave as is.",
-                             octstr_get_cstr(smpp->alt_charset), "ISO-8859-1");
+                             octstr_get_cstr(smpp->alt_charset), SMPP_CHARSET);
                 msg->sms.coding = DC_7BIT;
             } else { /* assume GSM 03.38 7-bit alphabet */
                 charset_gsm_to_latin1(msg->sms.msgdata);
@@ -845,10 +847,10 @@
             /*
              * convert to the given alternative charset
              */
-            if (charset_convert(pdu->u.submit_sm.short_message, "ISO-8859-1",
+            if (charset_convert(pdu->u.submit_sm.short_message, SMPP_CHARSET,
                                 octstr_get_cstr(smpp->alt_charset)) != 0)
                 error(0, "Failed to convert msgdata from charset <%s> to <%s>, will send as is.",
-                             "ISO-8859-1", octstr_get_cstr(smpp->alt_charset));
+                             SMPP_CHARSET, octstr_get_cstr(smpp->alt_charset));
         }
     }

Re: dlr_mysql_add and internal charset

Reply via email to