Re: [NTG-context] UTF conversion via Lua

2012-02-17 Thread Procházka Lukáš Ing . - Pontex s . r . o .

Hello Hans,

thank you for the extension; I've tested and it works perfectly.

On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen pra...@wxs.nl wrote:


regimes.toregime('8859-1',abcde Ä,?)

but you'll have to test and wikify it.


I'll going to wikify it -

- I supppose:

regimes.toregime(target-regime, text-to-convert, third-arg)

so question - what is the third-argment used for?

Maybe as default character when the UTF code cannot be mapped to 
target-regime?

(It didn't happen in my case, so I can just estimate what third-arg is for.)

Best regards,

Lukas



Hans



--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] UTF conversion via Lua

2012-02-17 Thread Hans Hagen

On 17-2-2012 09:09, Procházka Lukáš Ing. - Pontex s. r. o. wrote:

Hello Hans,

thank you for the extension; I've tested and it works perfectly.

On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen pra...@wxs.nl wrote:


regimes.toregime('8859-1',abcde Ä,?)

but you'll have to test and wikify it.


I'll going to wikify it -

- I supppose:

regimes.toregime(target-regime, text-to-convert, third-arg)

so question - what is the third-argment used for?

Maybe as default character when the UTF code cannot be mapped to
target-regime?


yes


(It didn't happen in my case, so I can just estimate what third-arg is
for.)


then you should make a test for it (just take some chinese character and 
see if it becomes a ?)


Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] UTF conversion via Lua

2012-02-17 Thread Procházka Lukáš Ing . - Pontex s . r . o .

On Fri, 17 Feb 2012 09:19:16 +0100, Hans Hagen pra...@wxs.nl wrote:


On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen pra...@wxs.nl wrote:


regimes.toregime('8859-1',abcde Ä,?)

but you'll have to test and wikify it.


Wikified - 
http://wiki.contextgarden.net/Encodings_and_Regimes#Conversion_between_encodings.

Lukas


--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] UTF conversion via Lua

2012-02-16 Thread Procházka Lukáš Ing . - Pontex s . r . o .

Hello,

one more question -

- does regimes.translate allow to translate between arbitrary encodings or 
only from the specified to the current one?


  str = regimes.translate(str, cp1250) -- = Translate from cp1250 to the 
current encoding (UTF) (or always to UTF?)


I'm looking for something like:


  src_enc = utf8
  tgt_enc = cp1250

  str = regimes.translate(str, src_enc, tgt_enc)


Any idea?

Best regards,

Lukas


On Fri, 10 Feb 2012 13:25:40 +0100, Wolfgang Schuster 
schuster.wolfg...@googlemail.com wrote:


str = regimes.translate(str,cp1250)



--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-16 Thread Hans Hagen

On 16-2-2012 12:13, Procházka Lukáš Ing. - Pontex s. r. o. wrote:

Hello,

one more question -

- does regimes.translate allow to translate between arbitrary
encodings or only from the specified to the current one?


no, although it's no big deal to provide that (of course there is then 
the matter of utf being more complete than the target)




str = regimes.translate(str, cp1250) -- = Translate from cp1250 to
the current encoding (UTF) (or always to UTF?)


I'm looking for something like:


src_enc = utf8
tgt_enc = cp1250

str = regimes.translate(str, src_enc, tgt_enc)


Any idea?


is there a reason not to stick to utf?

Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua (renaming attachments to .scr_)

2012-02-16 Thread Procházka Lukáš Ing . - Pontex s . r . o .

On Thu, 16 Feb 2012 13:08:09 +0100, Hans Hagen pra...@wxs.nl wrote:


no, although it's no big deal to provide that (of course there is then
the matter of utf being more complete than the target)



src_enc = utf8
tgt_enc = cp1250

str = regimes.translate(str, src_enc, tgt_enc)



is there a reason not to stick to utf?

Hans


Well - I'm working with a .cld document (with UTF encoding). There are some 
functions which typeset texts. And there is also a part which creates a .scr 
file.

.Scr files are sequences of AutoCAD commands - their contents are passed 
directly to AutoCAD command prompt.

When AutoCAD is creating a text entity, it reads the input stream (in our case: the .scr file) 
BYTE-PER-BYTE. When bytes represent a text to be drawn, unknown bytes (= bytes that don't have any 
graphical representation in AutoCAD font file (shape file in AutoCAD's terminology)) 
are shown as ?.

Of course, valid representation of language-specific-characters (like čřž... in Czech) 
requires an appropriate .shx (= shape compiled) file.

Anyway, when AutoCAD is to write č, it requires just ONE BYTE to be passed to it; so 
2-byte UTF representation gives bad result (= ??).

So back to the origin, when I call the .cld's function that writes a command to 
the .scr file, I need to convert a UTF string back to CP 1250.

Would it be possible to provide this?

NB: There are two examples of .scr files; CP1250.scr works well in AutoCAD, the latter draws 
ST instead of ČÁST.

Kind reagrds,

Lukas


--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

UTF.scr_
Description: Binary data


CP1250.scr_
Description: Binary data
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua (renaming attachments to .scr_)

2012-02-16 Thread Hans Hagen

On 16-2-2012 14:14, Procházka Lukáš Ing. - Pontex s. r. o. wrote:


Would it be possible to provide this?


I'll provide:

regimes.toregime('8859-1',abcde Ä,?)

but you'll have to test and wikify it.

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] UTF conversion via Lua

2012-02-13 Thread Ulrike Fischer
Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso:

 if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
 encoding there is no need to conversion;

This is not true. You are mixing up unicode positions and utf8
encoding.

E.g. ä has the same position in unicode and latin1 (dez 228, hex
E4). But its utf8 code consist of 16 bits (111110100100, hex
c3a4) while its latin 1 code is 8-bit long (11100100).


-- 
Ulrike Fischer 

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] UTF conversion via Lua

2012-02-13 Thread luigi scarso
On Mon, Feb 13, 2012 at 12:42 PM, Ulrike Fischer ne...@nililand.de wrote:
 Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso:

 if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
 encoding there is no need to conversion;

 This is not true. You are mixing up unicode positions and utf8
 encoding.

 E.g. ä has the same position in unicode and latin1 (dez 228, hex
 E4). But its utf8 code consist of 16 bits (111110100100, hex
 c3a4) while its latin 1 code is 8-bit long (11100100).
ah yes you are right -- I've made the implicit assumption that his
file was already utf-8 encoded .
 I'm using only utf-8 from long time and I almost forget about
! String contains an invalid utf-8 sequence.

system   tex  error on line 10 in file t1.txt: String
contains an invalid utf-8 sequence ...


(I believe  he met the error during the next tries because he wrote
 I cannot \input the file as this is not a valid ConTeXt source.
)
What I meant was, as I wrote below,
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards
and this is true only for iso-8859-1 .

-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

[NTG-context] UTF conversion via Lua

2012-02-10 Thread Procházka Lukáš Ing . - Pontex s . r . o .

Hello,

I have many files with ASCII encoding; this encoding must be kept as these 
files are processed also by another program.

When I work with them in ConTeXt, I need to convert them to UTF.

Does Lua (in ConTeXt scope) offer a transformation function or a table of chars 
[ASCII-code] - [UTF-code] or anything to provide the conversion?

Something like:

\startluacode
  local str = loadFile(a.txt) -- ASCII coded

  str = context.ACSII2UTF(str) -- Or something like this
\stopluacode

Best regards,

Lukas


--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Thomas A. Schmitz

On 02/10/12 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:

Hello,

I have many files with ASCII encoding; this encoding must be kept as
these files are processed also by another program.

When I work with them in ConTeXt, I need to convert them to UTF.

Does Lua (in ConTeXt scope) offer a transformation function or a table
of chars [ASCII-code] - [UTF-code] or anything to provide the conversion?

Something like:

\startluacode
   local str = loadFile(a.txt) -- ASCII coded

   str = context.ACSII2UTF(str) -- Or something like this
\stopluacode

Best regards,

Lukas


Have a look at tex/texmf-context/scripts/context/lua/mtx-babel.lua. 
That's a converter Hans wrote a while ago for a similar problem I had. I 
don't know if it still works out of the box, but it should help you get 
an idea what you could do.


Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Philipp Gesang
On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
 Hello,
 
 I have many files with ASCII encoding; this encoding must be kept as these 
 files are processed also by another program.
 
 When I work with them in ConTeXt, I need to convert them to UTF.

Not needed, as every ASCII string is a valid UTF8  string:
   “The UTF encoding has several good properties. By far the most
important is that a byte in the ASCII range 0-127 represents
itself in UTF. Thus UTF is backward compatible with ASCII.”
http://doc.cat-v.org/plan_9/4th_edition/papers/utf
You can use them in Luatex without further conversion.

Regards
Philipp


 
 Does Lua (in ConTeXt scope) offer a transformation function or a table of 
 chars [ASCII-code] - [UTF-code] or anything to provide the conversion?
 
 Something like:
 
 \startluacode
   local str = loadFile(a.txt) -- ASCII coded
 
   str = context.ACSII2UTF(str) -- Or something like this
 \stopluacode
 
 Best regards,
 
 Lukas
 
 
 -- 
 Ing. Lukáš Procházka [mailto:l...@pontex.cz]
 Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
 Bezová 1658
 147 14 Praha 4
 
 Tel: +420 244 062 238
 Fax: +420 244 461 038
 
 ___
 If your question is of interest to others as well, please add an entry to the 
 Wiki!
 
 maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
 webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
 archive  : http://foundry.supelec.fr/projects/contextrev/
 wiki : http://contextgarden.net
 ___

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments


pgpVSIz0X2Hdy.pgp
Description: PGP signature
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Procházka Lukáš Ing . - Pontex s . r . o .

... Well, my information was not correct.

There are characters  127 in the file, like ř, š...

Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are 
displayed correctly.

But I have problem loading them into ConTeXt.

I need to convert the bytes  127 to UTF sequence, which would be acceptable by 
ConTeXt.

@Thomas:

The table looks nice but there are no entries for CP 1250 to UTF conversion.

I prepared some tables: character conversion and removal of diacritics (see the 
attachment);
maybe it would be handful to include them into ConTeXt somehow.

Best regards,

Lukas


On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang 
ges...@stud.uni-heidelberg.de wrote:


On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:

Hello,

I have many files with ASCII encoding; this encoding must be kept as these 
files are processed also by another program.

When I work with them in ConTeXt, I need to convert them to UTF.


Not needed, as every ASCII string is a valid UTF8  string:
   “The UTF encoding has several good properties. By far the most
important is that a byte in the ASCII range 0-127 represents
itself in UTF. Thus UTF is backward compatible with ASCII.”
http://doc.cat-v.org/plan_9/4th_edition/papers/utf
You can use them in Luatex without further conversion.

Regards
Philipp




Does Lua (in ConTeXt scope) offer a transformation function or a table of chars 
[ASCII-code] - [UTF-code] or anything to provide the conversion?

Something like:

\startluacode
  local str = loadFile(a.txt) -- ASCII coded

  str = context.ACSII2UTF(str) -- Or something like this
\stopluacode

Best regards,

Lukas


--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___





--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua (now with attachment)

2012-02-10 Thread Procházka Lukáš Ing . - Pontex s . r . o .

... Well, my information was not correct.

There are characters  127 in the file, like ř, š...

Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are 
displayed correctly.

But I have problem loading them into ConTeXt.

I need to convert the bytes  127 to UTF sequence, which would be acceptable by 
ConTeXt.

@Thomas:

The table looks nice but there are no entries for CP 1250 to UTF conversion.

I prepared some tables: character conversion and removal of diacritics (see the 
attachment);
maybe it would be handful to include them into ConTeXt somehow.

Best regards,

Lukas


On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang 
ges...@stud.uni-heidelberg.de wrote:


On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:

Hello,

I have many files with ASCII encoding; this encoding must be kept as these 
files are processed also by another program.

When I work with them in ConTeXt, I need to convert them to UTF.


Not needed, as every ASCII string is a valid UTF8  string:
   “The UTF encoding has several good properties. By far the most
important is that a byte in the ASCII range 0-127 represents
itself in UTF. Thus UTF is backward compatible with ASCII.”
http://doc.cat-v.org/plan_9/4th_edition/papers/utf
You can use them in Luatex without further conversion.

Regards
Philipp




Does Lua (in ConTeXt scope) offer a transformation function or a table of chars 
[ASCII-code] - [UTF-code] or anything to provide the conversion?

Something like:

\startluacode
  local str = loadFile(a.txt) -- ASCII coded

  str = context.ACSII2UTF(str) -- Or something like this
\stopluacode

Best regards,

Lukas


--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___





--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

Cz2UTF.lua
Description: Binary data
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread luigi scarso
2012/2/10 Procházka Lukáš Ing. - Pontex s. r. o. l...@pontex.cz:
 ... Well, my information was not correct.

 There are characters  127 in the file, like ř, š...

 Each char = 1 byte, and as I'm using Windows with CP 1250, the characters
 are displayed correctly.

 But I have problem loading them into ConTeXt.

 I need to convert the bytes  127 to UTF sequence, which would be acceptable
 by ConTeXt.

 @Thomas:

 The table looks nice but there are no entries for CP 1250 to UTF conversion.

 I prepared some tables: character conversion and removal of diacritics (see
 the attachment);
 maybe it would be handful to include them into ConTeXt somehow.

 Best regards,

 Lukas

To avoid confusion :
If you mean ASCII with coderange 0-127, there is no need to conversion;
if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
encoding there is no need to conversion;
otherwise you need to specify an encoding (i.e. CP 1250)


From wikipedia

Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a
much wider array of characters, and their various encoding forms have
begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments.
While ASCII is limited to 128 characters, Unicode and the UCS support
more characters by separating the concepts of unique identification
(using natural numbers called code points) and encoding (to 8-, 16- or
32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards. Therefore, ASCII can be
considered a 7-bit encoding scheme for a very small subset of
Unicode/UCS, and, conversely, the UTF-8 encoding forms are
binary-compatible with ASCII for code points below 128, meaning all
ASCII is valid UTF-8. The other encoding forms resemble ASCII in how
they represent the first 128 characters of Unicode, but use 16 or 32
bits per character, so they require conversion for compatibility.
(similarly UCS-2 is upwards compatible with UTF-16)

If you have iconv, convert between encoding is easy --- you can always
call it as an external program with os.execute(cmd)

-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Wolfgang Schuster

Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:

 ... Well, my information was not correct.
 
 There are characters  127 in the file, like ř, š...
 
 Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are 
 displayed correctly.
 
 But I have problem loading them into ConTeXt.
 
 I need to convert the bytes  127 to UTF sequence, which would be acceptable 
 by ConTeXt.
 
 @Thomas:
 
 The table looks nice but there are no entries for CP 1250 to UTF conversion.
 
 I prepared some tables: character conversion and removal of diacritics (see 
 the attachment);
 maybe it would be handful to include them into ConTeXt somehow.

Why don’t you let do context the conversion:

\starttext

this is something in utf8

\startregime[cp1250]
\input filewithcp1250encoding
\stopregime

more text encoded in utf8

\stoptext

Wolfgang
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Philipp Gesang
On 2012-02-10 12:11, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
 ... Well, my information was not correct.
 
 There are characters  127 in the file, like ř, š...
 
 Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are 
 displayed correctly.

So it wasn’t ASCII after all ;-) No problem, just use iconv:

 iconv -f CP1250 -t UTF8 infile  outfile

I do this a lot with movie subtitles …

Hth, Philipp


PS: If you still insist on converting at the Lua end only then
your starting point might be “regi-cp1250.lua” in the
Context base/ dir.




 
 But I have problem loading them into ConTeXt.
 
 I need to convert the bytes  127 to UTF sequence, which would be acceptable 
 by ConTeXt.
 
 @Thomas:
 
 The table looks nice but there are no entries for CP 1250 to UTF conversion.
 
 I prepared some tables: character conversion and removal of diacritics (see 
 the attachment);
 maybe it would be handful to include them into ConTeXt somehow.
 
 Best regards,
 
 Lukas
 
 
 On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang 
 ges...@stud.uni-heidelberg.de wrote:
 
 On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
 Hello,
 
 I have many files with ASCII encoding; this encoding must be kept as these 
 files are processed also by another program.
 
 When I work with them in ConTeXt, I need to convert them to UTF.
 
 Not needed, as every ASCII string is a valid UTF8  string:
“The UTF encoding has several good properties. By far the most
 important is that a byte in the ASCII range 0-127 represents
 itself in UTF. Thus UTF is backward compatible with ASCII.”
 http://doc.cat-v.org/plan_9/4th_edition/papers/utf
 You can use them in Luatex without further conversion.
 
 Regards
 Philipp
 
 
 
 Does Lua (in ConTeXt scope) offer a transformation function or a table of 
 chars [ASCII-code] - [UTF-code] or anything to provide the conversion?
 
 Something like:
 
 \startluacode
   local str = loadFile(a.txt) -- ASCII coded
 
   str = context.ACSII2UTF(str) -- Or something like this
 \stopluacode
 
 Best regards,
 
 Lukas
 
 
 --
 Ing. Lukáš Procházka [mailto:l...@pontex.cz]
 Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
 Bezová 1658
 147 14 Praha 4
 
 Tel: +420 244 062 238
 Fax: +420 244 461 038
 
 ___
 If your question is of interest to others as well, please add an entry to 
 the Wiki!
 
 maillist : ntg-context@ntg.nl / 
 http://www.ntg.nl/mailman/listinfo/ntg-context
 webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
 archive  : http://foundry.supelec.fr/projects/contextrev/
 wiki : http://contextgarden.net
 ___
 
 
 
 -- 
 Ing. Lukáš Procházka [mailto:l...@pontex.cz]
 Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
 Bezová 1658
 147 14 Praha 4
 
 Tel: +420 244 062 238
 Fax: +420 244 461 038
 
 ___
 If your question is of interest to others as well, please add an entry to the 
 Wiki!
 
 maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
 webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
 archive  : http://foundry.supelec.fr/projects/contextrev/
 wiki : http://contextgarden.net
 ___

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments


pgpV0VG6afFJr.pgp
Description: PGP signature
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Procházka Lukáš Ing . - Pontex s . r . o .

... \enableregime - nice idea!

Despite this, I'm still not able to make work the example:

 Test.mkiv
\enableregime[cp1250]

\starttext
  \startluacode
function loadFile(fn)
  local fh = assert(io.open(fn, r))
  local str = fh:read(*all)

  fh:close()

  return str
end

context.startregime{cp1250}
  context(loadFile(a.txt))
context.stopregime()
  \stopluacode
\stoptext


Where's the problem?

Lukas


On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster 
schuster.wolfg...@googlemail.com wrote:



Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:


... Well, my information was not correct.

There are characters  127 in the file, like ř, š...

Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are 
displayed correctly.

But I have problem loading them into ConTeXt.

I need to convert the bytes  127 to UTF sequence, which would be acceptable by 
ConTeXt.

@Thomas:

The table looks nice but there are no entries for CP 1250 to UTF conversion.

I prepared some tables: character conversion and removal of diacritics (see the 
attachment);
maybe it would be handful to include them into ConTeXt somehow.


Why don’t you let do context the conversion:

\starttext

this is something in utf8

\startregime[cp1250]
\input filewithcp1250encoding
\stopregime

more text encoded in utf8

\stoptext

Wolfgang
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___



--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038abc
žý


Test.mkiv
Description: Binary data
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Procházka Lukáš Ing . - Pontex s . r . o .

One more note -

On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster 
schuster.wolfg...@googlemail.com wrote:


Why don’t you let do context the conversion:

\starttext

this is something in utf8

\startregime[cp1250]
\input filewithcp1250encoding
\stopregime

more text encoded in utf8

\stoptext


I cannot \input the file as this is not a valid ConTeXt source.

I do (at least) % - \% conversion;
that's why I need to use Lua to load file into a string;
the conversion step was removed - to make it simple - in the sample sent 
previously.

Lukas


--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Wolfgang Schuster

Am 10.02.2012 um 12:32 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:

 ... \enableregime - nice idea!
 
 Despite this, I'm still not able to make work the example:
 
  Test.mkiv
 \enableregime[cp1250]
 
 \starttext
  \startluacode
function loadFile(fn)
  local fh = assert(io.open(fn, r))
  local str = fh:read(*all)
 
  fh:close()
 
  return str
end
 
context.startregime{cp1250}
  context(loadFile(a.txt))
context.stopregime()
  \stopluacode
 \stoptext
 
 
 Where's the problem?


Dunno but it works when you use “regimes.translate” in your code but it’s better
to ask Hans for a function in the commands namespace which you can use.

\starttext

\startluacode

function loadFile(fn)
local fh = assert(io.open(fn, r))
local str = fh:read(*all)
fh:close()
str = regimes.translate(str,cp1250)
context(str)
end

loadFile(a.txt)

\stopluacode

\stoptext

Wolfgang
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Procházka Lukáš Ing . - Pontex s . r . o .

Dunno but it works when you use “regimes.translate” in your code but it’s better
to ask Hans for a function in the commands namespace which you can use.

\starttext

\startluacode

function loadFile(fn)
local fh = assert(io.open(fn, r))
local str = fh:read(*all)
fh:close()
str = regimes.translate(str,cp1250)
context(str)
end

loadFile(a.txt)

\stopluacode

\stoptext

Wolfgang


Thank you, Wolfgang.

Your code works perfectly and does exactly what I need.

Best regards,

Lukas


--
Ing. Lukáš Procházka [mailto:l...@pontex.cz]
Pontex s. r. o.  [mailto:pon...@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] UTF conversion via Lua

2012-02-10 Thread Hans Hagen

On 10-2-2012 14:15, Procházka Lukáš Ing. - Pontex s. r. o. wrote:

Dunno but it works when you use “regimes.translate” in your code but
it’s better
to ask Hans for a function in the commands namespace which you can use.

\starttext

\startluacode

function loadFile(fn)
local fh = assert(io.open(fn, r))
local str = fh:read(*all)
fh:close()
str = regimes.translate(str,cp1250)
context(str)
end

loadFile(a.txt)

\stopluacode

\stoptext

Wolfgang


Thank you, Wolfgang.

Your code works perfectly and does exactly what I need.


As oneliner ...

function document.MyLoadFile(name)

context(regimes.translate(io.loaddata(resolvers.findfile(name)),cp1250))
end

(resolvers will look up in the tree if needed)

Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___