... Well, my information was not correct. There are characters > 127 in the file, like "ř", "š"...
Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. But I have problem loading them into ConTeXt. I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF conversion. I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow. Best regards, Lukas On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang <ges...@stud.uni-heidelberg.de> wrote:
On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:Hello, I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. When I work with them in ConTeXt, I need to convert them to UTF.Not needed, as every ASCII string is a valid UTF8 string: “The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.” http://doc.cat-v.org/plan_9/4th_edition/papers/utf You can use them in Luatex without further conversion. Regards PhilippDoes Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion? Something like: \startluacode local str = loadFile("a.txt") -- ASCII coded str = context.ACSII2UTF(str) -- Or something like this \stopluacode Best regards, Lukas -- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- Ing. Lukáš Procházka [mailto:l...@pontex.cz] Pontex s. r. o. [mailto:pon...@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038
Cz2UTF.lua
Description: Binary data
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________