Re: [NTG-context] UTF conversion via Lua
Hello Hans, thank you for the extension; I've tested and it works perfectly. On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen pra...@wxs.nl wrote: regimes.toregime('8859-1',abcde Ä,?) but you'll have to test and wikify it. I'll going to wikify it - - I supppose: regimes.toregime(target-regime, text-to-convert, third-arg) so question - what is the third-argment used for? Maybe as default character when the UTF code cannot be mapped to target-regime? (It didn't happen in my case, so I can just estimate what third-arg is for.) Best regards, Lukas Hans -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On 17-2-2012 09:09, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Hello Hans, thank you for the extension; I've tested and it works perfectly. On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen pra...@wxs.nl wrote: regimes.toregime('8859-1',abcde Ä,?) but you'll have to test and wikify it. I'll going to wikify it - - I supppose: regimes.toregime(target-regime, text-to-convert, third-arg) so question - what is the third-argment used for? Maybe as default character when the UTF code cannot be mapped to target-regime? yes (It didn't happen in my case, so I can just estimate what third-arg is for.) then you should make a test for it (just take some chinese character and see if it becomes a ?) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On Fri, 17 Feb 2012 09:19:16 +0100, Hans Hagen pra...@wxs.nl wrote: On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen pra...@wxs.nl wrote: regimes.toregime('8859-1',abcde Ä,?) but you'll have to test and wikify it. Wikified - http://wiki.contextgarden.net/Encodings_and_Regimes#Conversion_between_encodings. Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
Hello, one more question - - does regimes.translate allow to translate between arbitrary encodings or only from the specified to the current one? str = regimes.translate(str, cp1250) -- = Translate from cp1250 to the current encoding (UTF) (or always to UTF?) I'm looking for something like: src_enc = utf8 tgt_enc = cp1250 str = regimes.translate(str, src_enc, tgt_enc) Any idea? Best regards, Lukas On Fri, 10 Feb 2012 13:25:40 +0100, Wolfgang Schuster schuster.wolfg...@googlemail.com wrote: str = regimes.translate(str,cp1250) -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On 16-2-2012 12:13, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Hello, one more question - - does regimes.translate allow to translate between arbitrary encodings or only from the specified to the current one? no, although it's no big deal to provide that (of course there is then the matter of utf being more complete than the target) str = regimes.translate(str, cp1250) -- = Translate from cp1250 to the current encoding (UTF) (or always to UTF?) I'm looking for something like: src_enc = utf8 tgt_enc = cp1250 str = regimes.translate(str, src_enc, tgt_enc) Any idea? is there a reason not to stick to utf? Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua (renaming attachments to .scr_)
On Thu, 16 Feb 2012 13:08:09 +0100, Hans Hagen pra...@wxs.nl wrote: no, although it's no big deal to provide that (of course there is then the matter of utf being more complete than the target) src_enc = utf8 tgt_enc = cp1250 str = regimes.translate(str, src_enc, tgt_enc) is there a reason not to stick to utf? Hans Well - I'm working with a .cld document (with UTF encoding). There are some functions which typeset texts. And there is also a part which creates a .scr file. .Scr files are sequences of AutoCAD commands - their contents are passed directly to AutoCAD command prompt. When AutoCAD is creating a text entity, it reads the input stream (in our case: the .scr file) BYTE-PER-BYTE. When bytes represent a text to be drawn, unknown bytes (= bytes that don't have any graphical representation in AutoCAD font file (shape file in AutoCAD's terminology)) are shown as ?. Of course, valid representation of language-specific-characters (like čřž... in Czech) requires an appropriate .shx (= shape compiled) file. Anyway, when AutoCAD is to write č, it requires just ONE BYTE to be passed to it; so 2-byte UTF representation gives bad result (= ??). So back to the origin, when I call the .cld's function that writes a command to the .scr file, I need to convert a UTF string back to CP 1250. Would it be possible to provide this? NB: There are two examples of .scr files; CP1250.scr works well in AutoCAD, the latter draws ST instead of ČÁST. Kind reagrds, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 UTF.scr_ Description: Binary data CP1250.scr_ Description: Binary data ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua (renaming attachments to .scr_)
On 16-2-2012 14:14, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Would it be possible to provide this? I'll provide: regimes.toregime('8859-1',abcde Ä,?) but you'll have to test and wikify it. Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso: if you mean ASCII with coderange 0-255 *and* ISO-8859-1 (Latin 1) encoding there is no need to conversion; This is not true. You are mixing up unicode positions and utf8 encoding. E.g. ä has the same position in unicode and latin1 (dez 228, hex E4). But its utf8 code consist of 16 bits (111110100100, hex c3a4) while its latin 1 code is 8-bit long (11100100). -- Ulrike Fischer ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On Mon, Feb 13, 2012 at 12:42 PM, Ulrike Fischer ne...@nililand.de wrote: Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso: if you mean ASCII with coderange 0-255 *and* ISO-8859-1 (Latin 1) encoding there is no need to conversion; This is not true. You are mixing up unicode positions and utf8 encoding. E.g. ä has the same position in unicode and latin1 (dez 228, hex E4). But its utf8 code consist of 16 bits (111110100100, hex c3a4) while its latin 1 code is 8-bit long (11100100). ah yes you are right -- I've made the implicit assumption that his file was already utf-8 encoded . I'm using only utf-8 from long time and I almost forget about ! String contains an invalid utf-8 sequence. system tex error on line 10 in file t1.txt: String contains an invalid utf-8 sequence ... (I believe he met the error during the next tries because he wrote I cannot \input the file as this is not a valid ConTeXt source. ) What I meant was, as I wrote below, To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards and this is true only for iso-8859-1 . -- luigi ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
[NTG-context] UTF conversion via Lua
Hello, I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. When I work with them in ConTeXt, I need to convert them to UTF. Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] - [UTF-code] or anything to provide the conversion? Something like: \startluacode local str = loadFile(a.txt) -- ASCII coded str = context.ACSII2UTF(str) -- Or something like this \stopluacode Best regards, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On 02/10/12 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Hello, I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. When I work with them in ConTeXt, I need to convert them to UTF. Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] - [UTF-code] or anything to provide the conversion? Something like: \startluacode local str = loadFile(a.txt) -- ASCII coded str = context.ACSII2UTF(str) -- Or something like this \stopluacode Best regards, Lukas Have a look at tex/texmf-context/scripts/context/lua/mtx-babel.lua. That's a converter Hans wrote a while ago for a similar problem I had. I don't know if it still works out of the box, but it should help you get an idea what you could do. Thomas ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Hello, I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. When I work with them in ConTeXt, I need to convert them to UTF. Not needed, as every ASCII string is a valid UTF8 string: “The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.” http://doc.cat-v.org/plan_9/4th_edition/papers/utf You can use them in Luatex without further conversion. Regards Philipp Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] - [UTF-code] or anything to provide the conversion? Something like: \startluacode local str = loadFile(a.txt) -- ASCII coded str = context.ACSII2UTF(str) -- Or something like this \stopluacode Best regards, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments pgpVSIz0X2Hdy.pgp Description: PGP signature ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
... Well, my information was not correct. There are characters 127 in the file, like ř, š... Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. But I have problem loading them into ConTeXt. I need to convert the bytes 127 to UTF sequence, which would be acceptable by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF conversion. I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow. Best regards, Lukas On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang ges...@stud.uni-heidelberg.de wrote: On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Hello, I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. When I work with them in ConTeXt, I need to convert them to UTF. Not needed, as every ASCII string is a valid UTF8 string: “The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.” http://doc.cat-v.org/plan_9/4th_edition/papers/utf You can use them in Luatex without further conversion. Regards Philipp Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] - [UTF-code] or anything to provide the conversion? Something like: \startluacode local str = loadFile(a.txt) -- ASCII coded str = context.ACSII2UTF(str) -- Or something like this \stopluacode Best regards, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua (now with attachment)
... Well, my information was not correct. There are characters 127 in the file, like ř, š... Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. But I have problem loading them into ConTeXt. I need to convert the bytes 127 to UTF sequence, which would be acceptable by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF conversion. I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow. Best regards, Lukas On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang ges...@stud.uni-heidelberg.de wrote: On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Hello, I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. When I work with them in ConTeXt, I need to convert them to UTF. Not needed, as every ASCII string is a valid UTF8 string: “The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.” http://doc.cat-v.org/plan_9/4th_edition/papers/utf You can use them in Luatex without further conversion. Regards Philipp Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] - [UTF-code] or anything to provide the conversion? Something like: \startluacode local str = loadFile(a.txt) -- ASCII coded str = context.ACSII2UTF(str) -- Or something like this \stopluacode Best regards, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 Cz2UTF.lua Description: Binary data ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
2012/2/10 Procházka Lukáš Ing. - Pontex s. r. o. l...@pontex.cz: ... Well, my information was not correct. There are characters 127 in the file, like ř, š... Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. But I have problem loading them into ConTeXt. I need to convert the bytes 127 to UTF sequence, which would be acceptable by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF conversion. I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow. Best regards, Lukas To avoid confusion : If you mean ASCII with coderange 0-127, there is no need to conversion; if you mean ASCII with coderange 0-255 *and* ISO-8859-1 (Latin 1) encoding there is no need to conversion; otherwise you need to specify an encoding (i.e. CP 1250) From wikipedia Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters, and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32). To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards. Therefore, ASCII can be considered a 7-bit encoding scheme for a very small subset of Unicode/UCS, and, conversely, the UTF-8 encoding forms are binary-compatible with ASCII for code points below 128, meaning all ASCII is valid UTF-8. The other encoding forms resemble ASCII in how they represent the first 128 characters of Unicode, but use 16 or 32 bits per character, so they require conversion for compatibility. (similarly UCS-2 is upwards compatible with UTF-16) If you have iconv, convert between encoding is easy --- you can always call it as an external program with os.execute(cmd) -- luigi ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.: ... Well, my information was not correct. There are characters 127 in the file, like ř, š... Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. But I have problem loading them into ConTeXt. I need to convert the bytes 127 to UTF sequence, which would be acceptable by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF conversion. I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow. Why don’t you let do context the conversion: \starttext this is something in utf8 \startregime[cp1250] \input filewithcp1250encoding \stopregime more text encoded in utf8 \stoptext Wolfgang ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On 2012-02-10 12:11, Procházka Lukáš Ing. - Pontex s. r. o. wrote: ... Well, my information was not correct. There are characters 127 in the file, like ř, š... Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. So it wasn’t ASCII after all ;-) No problem, just use iconv: iconv -f CP1250 -t UTF8 infile outfile I do this a lot with movie subtitles … Hth, Philipp PS: If you still insist on converting at the Lua end only then your starting point might be “regi-cp1250.lua” in the Context base/ dir. But I have problem loading them into ConTeXt. I need to convert the bytes 127 to UTF sequence, which would be acceptable by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF conversion. I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow. Best regards, Lukas On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang ges...@stud.uni-heidelberg.de wrote: On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Hello, I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. When I work with them in ConTeXt, I need to convert them to UTF. Not needed, as every ASCII string is a valid UTF8 string: “The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.” http://doc.cat-v.org/plan_9/4th_edition/papers/utf You can use them in Luatex without further conversion. Regards Philipp Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] - [UTF-code] or anything to provide the conversion? Something like: \startluacode local str = loadFile(a.txt) -- ASCII coded str = context.ACSII2UTF(str) -- Or something like this \stopluacode Best regards, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments pgpV0VG6afFJr.pgp Description: PGP signature ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
... \enableregime - nice idea! Despite this, I'm still not able to make work the example: Test.mkiv \enableregime[cp1250] \starttext \startluacode function loadFile(fn) local fh = assert(io.open(fn, r)) local str = fh:read(*all) fh:close() return str end context.startregime{cp1250} context(loadFile(a.txt)) context.stopregime() \stopluacode \stoptext Where's the problem? Lukas On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster schuster.wolfg...@googlemail.com wrote: Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.: ... Well, my information was not correct. There are characters 127 in the file, like ř, š... Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. But I have problem loading them into ConTeXt. I need to convert the bytes 127 to UTF sequence, which would be acceptable by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF conversion. I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow. Why don’t you let do context the conversion: \starttext this is something in utf8 \startregime[cp1250] \input filewithcp1250encoding \stopregime more text encoded in utf8 \stoptext Wolfgang ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038abc ý Test.mkiv Description: Binary data ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
One more note - On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster schuster.wolfg...@googlemail.com wrote: Why don’t you let do context the conversion: \starttext this is something in utf8 \startregime[cp1250] \input filewithcp1250encoding \stopregime more text encoded in utf8 \stoptext I cannot \input the file as this is not a valid ConTeXt source. I do (at least) % - \% conversion; that's why I need to use Lua to load file into a string; the conversion step was removed - to make it simple - in the sample sent previously. Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
Am 10.02.2012 um 12:32 schrieb Procházka Lukáš Ing. - Pontex s. r. o.: ... \enableregime - nice idea! Despite this, I'm still not able to make work the example: Test.mkiv \enableregime[cp1250] \starttext \startluacode function loadFile(fn) local fh = assert(io.open(fn, r)) local str = fh:read(*all) fh:close() return str end context.startregime{cp1250} context(loadFile(a.txt)) context.stopregime() \stopluacode \stoptext Where's the problem? Dunno but it works when you use “regimes.translate” in your code but it’s better to ask Hans for a function in the commands namespace which you can use. \starttext \startluacode function loadFile(fn) local fh = assert(io.open(fn, r)) local str = fh:read(*all) fh:close() str = regimes.translate(str,cp1250) context(str) end loadFile(a.txt) \stopluacode \stoptext Wolfgang ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
Dunno but it works when you use “regimes.translate” in your code but it’s better to ask Hans for a function in the commands namespace which you can use. \starttext \startluacode function loadFile(fn) local fh = assert(io.open(fn, r)) local str = fh:read(*all) fh:close() str = regimes.translate(str,cp1250) context(str) end loadFile(a.txt) \stopluacode \stoptext Wolfgang Thank you, Wolfgang. Your code works perfectly and does exactly what I need. Best regards, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] UTF conversion via Lua
On 10-2-2012 14:15, Procházka Lukáš Ing. - Pontex s. r. o. wrote: Dunno but it works when you use “regimes.translate” in your code but it’s better to ask Hans for a function in the commands namespace which you can use. \starttext \startluacode function loadFile(fn) local fh = assert(io.open(fn, r)) local str = fh:read(*all) fh:close() str = regimes.translate(str,cp1250) context(str) end loadFile(a.txt) \stopluacode \stoptext Wolfgang Thank you, Wolfgang. Your code works perfectly and does exactly what I need. As oneliner ... function document.MyLoadFile(name) context(regimes.translate(io.loaddata(resolvers.findfile(name)),cp1250)) end (resolvers will look up in the tree if needed) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___