Hi Peter, you are right! I must be blind today :( OK, I will look once again through your patch and commit it.
Peter Christensen wrote: > Hi Alex, > > Euro sign is 0x80 in windows-1252. It is iso-8859-15 which have euro at > the same place as the currency sign. > > gsm_to_latin1: > > static const struct { > int gsmesc; > int latin1; > } gsm_escapes[] = { > { 10, 12 }, /* ASCII page break */ > { 20, '^' }, > { 40, '{' }, > { 41, '}' }, > { 47, '\\' }, > { 60, '[' }, > { 61, '~' }, > { 62, ']' }, > { 64, '|' }, > { 101, 128 }, > { -1, -1 } > }; > > 101, 128 is €.. 0x1B 0x65 as GSM, 0x80 as windows-1252 > > > latin1_to_gsm: > > > 'x', 'y', 'z', -40, -64, -41, -61, NRP, /* 120 - 127 */ > -101, NRP, NRP, NRP, NRP, NRP, NRP, NRP, /* 128 - 135 */ > > Once again, 0x80 is encoded into 0x1B 0x65 > > > Med venlig hilsen / Best regards > > Peter Christensen > > Developer > ------------------ > Cool Systems ApS > > Tel: +45 2888 1600 > @ : [EMAIL PROTECTED] > www: www.coolsystems.dk > > > Alexander Malysh wrote: >> Hi Peter, >> >> I'm may be blind but I don't see where gsm_to_latin1 and latin1_to_gsm >> process euro sign? >> >> gsm_to_latin1: >> - euro sign is not in gsm_escapes >> - euro sign will be converted to 'e' because esc will be deleted >> >> latin1_to_gsm: >> - euro sign has the same code as CURRENCY SIGN and will be mapped to it >> (see latin1_to_gsm array). here is snipplet: >> /* 160 - 167 */ >> ' ', >> 64, /* Inverted ! */ >> 'c', /* approximation of cent marker */ >> 1, /* Pounds sterling */ >> 36, /* International currency symbol */ >> 3, /* Yen */ >> 64, /* approximate broken bar as inverted ! */ >> 95, /* Section marker */ >> >> Does it all make sense for you or I'm overlooked anything? >> >> Thanks, >> Alex >> >> Peter Christensen wrote: >> >>> Hi, >>> >>> The GSM charset have € as an escaped character (0x1B 0x65) and >>> latin1_to_gsm() and gsm_to_latin1() assume windows-1252 character set. >>> So while I do admit that the patch i focused on SMPP, I doubt that it >>> breaks any of the other protocols. >>> >>> If I go through each SMSC module: >>> >>> smsc_at.c: Never does any charset conversion, but uses latin1_to_gsm and >>> gsm_to_latin1. So actually, this one already assumes windows-1252. >>> >>> smsc_cgw.c: Apparently already assumes windows-1252 (0x80 = €). Does no >>> generic charset conversion. >>> >>> smsc_cimd.c: Uses iso-8859-1. This one will need patching. >>> >>> smsc_cmid2.c: iso-8859-1. Needs patching. >>> >>> smsc_emi.c: Uses latin1_to_gsm/gsm_to_latin1 >>> >>> smsc_emi_x25.c: Uses its own gsm_to_iso function. The code looks kinda >>> deprecated. No support for extended chars at all, apparently. >>> >>> smsc_fake.c: >>> >>> smsc_http.c: Seems to do no charset conversion >>> >>> smsc_ois.c: Uses latin1_to_gsm some places, but a simplified >>> gsm_to_iso88591 conversion elsewhere. >>> >>> smsc_oisd.c: Uses latin1_to_gsm/gsm_to_latin1 >>> >>> smsc_sema.c: Uses a simplified gsm conversion like the one in smsc_ois.c >>> >>> smsc_smasi.c: Not sure what charset this assumes. There are no apparent >>> charset conversions in place >>> >>> smsc_smpp.c: Uses latin1_to_gsm/gsm_to_latin1 and charset conversion. >>> Currently originator string is windows-1252 and body is iso-8859-1. >>> >>> smsc_soap.c: Uses iso-8859-1 >>> >>> smsc_wrapper.c: No apparent charset conversion >>> >>> >>> My point is, that while some protocols currently assume iso-8859-1, many >>> uses the latin1_to_gsm/gsm_to_latin1 which is ALREADY windows-1252. >>> Receipted messages from these gateways are windows-1252 as we speak, >>> although documentation says otherwise. But as long as smsbox uses >>> iso-8859-1 and not windows-1252, no gateway can transmit the € character >>> without manual escaping which I think is lame. If the charset in smsbox >>> was changed, at least some would have the possibility. >>> >>> All this being said, I do agree that using UTF-8 internally is the best >>> way to go (but I assume that it will take a while before this is done). >>> >>> >>> Med venlig hilsen / Best regards >>> >>> Peter Christensen >>> >>> Developer >>> ------------------ >>> Cool Systems ApS >>> >>> Tel: +45 2888 1600 >>> @ : [EMAIL PROTECTED] >>> www: www.coolsystems.dk >>> >>> >>> Alexander Malysh wrote: >>>> Hi, >>>> >>>> I don't see how your patch should help with euro sign if SMSC supports >>>> only GSM charset? and your patch is incomplete because it changes only >>>> SMPP module. >>>> >>>> What would be more suitable to support all GSM chars, is to switch >>>> internal kannel charset to UTF-8. I have patch somewhere but it will >>>> take some time to rebase it against current CVS and it's too intrusive >>>> (not 1.4.1 material). >>>> >>>> For now it would be easy to keep latin1 as default but allow ESC (27) >>>> to go through (in gwlib/charset.c change it from NRP to 27) and then >>>> you should be able to send euro sign via sendsms interface. >>>> >>>> Thanks, >>>> Alex >>>> >>>> Peter Christensen wrote: >>>> >>>>> Hi, >>>>> >>>>> At the request of Hillel, I have agreed to update my patch for the >>>>> internal character set of smsbox/smpp, and post it here, hoping for it >>>>> to be committed to CVS. >>>>> >>>>> It: >>>>> >>>>> * Changes the default 7-bit character set of smsbox to windows-1252 >>>>> instead of iso-8859-1, adding support for the euro-sign. (remember >>>>> that the latin1/gsm conversion functions already assumes windows-1252) >>>>> >>>>> * smsbox uses charset_convert instead of octstr_recode, because the >>>>> latter will convert the euro-sign into a HTML entity. >>>>> >>>>> * Changes the internal 7-bit character set of SMPP to windows-1252. >>>>> >>>>> * Updates the documentation accordingly. >>>>> >>>>> >>>>> The primary effect of this patch should be support for the € sign in >>>>> both SMS transmission and reception (at least for gateways, which >>>>> utilizes the latin1/gsm conversion functions). For the rest, this >>>>> should have no effect since windows-1252 is identical to iso-8859-1 >>>>> except for 0x80-0x9F which is unused in iso-8859-1. >>>>> >>>>> Just to clarify: Unless the problem is in octstr_recode, this patch >>>>> ONLY adds support for the € (euro) sign. Other characters such as £ >>>>> (pound) also worked before. If a gateway didn't support £ before, it >>>>> won't do it now either. Besides, this patch does NOT add support for >>>>> Greek GSM characters! >>>>> >> -- Thanks, Alex