Re: [NTG-context] Support for Thai in ConTeXt

2013-05-15 Thread Mojca Miklavec
On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote:
 On 5/14/2013 6:07 PM, luigi scarso wrote:

 I Hope  that someone can help here


 as Mojca mentioned thai at bachotex i'll add the patterns as a start

 given specs, examples and time, adding support for thai to context shouldn't
 be too hard (assuming that there are users)

But it's not trivial either.

There's an opensource project implementing word segmentation:
http://linux.thai.net/projects/swath
The specification (someone's thesis) can be found here:
http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf

The ugly part of pdfTeX approach is that it requires an external text
processor to digest an input TeX document and return a copy with word
segmentation. Then pdfTeX is run on the resulting file. XeTeX can use
ICU library to do the segmentation.

In LuaTeX one would have to plug the word segmentation somewhere (but
writing that part is slightly non-trivial).

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Support for Thai in ConTeXt

2013-05-15 Thread Hans Hagen

On 5/15/2013 4:09 PM, Mojca Miklavec wrote:

On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote:

On 5/14/2013 6:07 PM, luigi scarso wrote:


I Hope  that someone can help here



as Mojca mentioned thai at bachotex i'll add the patterns as a start

given specs, examples and time, adding support for thai to context shouldn't
be too hard (assuming that there are users)


But it's not trivial either.


It depends ... we're using a dictionary to determine word boundaries, 
aren't we? I'm pretty sure that I've done more complex coding.



There's an opensource project implementing word segmentation:
 http://linux.thai.net/projects/swath
The specification (someone's thesis) can be found here:
 http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf


Ok, so there are some ttext files there with words.


The ugly part of pdfTeX approach is that it requires an external text
processor to digest an input TeX document and return a copy with word
segmentation. Then pdfTeX is run on the resulting file. XeTeX can use
ICU library to do the segmentation.

In LuaTeX one would have to plug the word segmentation somewhere (but
writing that part is slightly non-trivial).


I just did a quick test using those dictionaries (abusing some code that 
i already had on my machine). Quite doable. It all depends on having the 
dictionaries available (on the garden or in the distribution).


Anyhow, it's not that much font related, just language / script support 
and we already have that for some languages and adding thai to it 
doesn't hurt. Of course we'd need some testing. It doesn't make much 
sense to add features to context that no one would use at some point.


But ... Luigi is already teaching himself Thai, so ...

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Support for Thai in ConTeXt

2013-05-15 Thread luigi scarso
On Wed, May 15, 2013 at 5:20 PM, Hans Hagen pra...@wxs.nl wrote:


 But ... Luigi is already teaching himself Thai, so ...

no no, just connecting people on different ml.
Currently I'm in a completely different area
-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

[NTG-context] Support for Thai in ConTeXt

2013-05-14 Thread luigi scarso
On Tue, May 14, 2013 at 5:59 PM, Theppitak Karoonboonyanan 
theppi...@gmail.com wrote:

 On Tue, May 14, 2013 at 9:58 PM, luigi scarso luigi.sca...@gmail.com
 wrote:
 
  On Tue, May 14, 2013 at 4:16 PM, Mojca Miklavec
  mojca.miklavec.li...@gmail.com wrote:
 
  I could also ask differently: suppose that a motivated Thai programmer
  would be willing to work on solving the problem properly. What would
  be the suggested solution?
 
  You can post also in the context ml, maybe there is some Thai user there
 .

 I am a Thai developer who works on Thai word segmentation tools and
 thailatex package. So, you can suggest to me. (Please Cc: me, I'm not
 in the mailing list.)

 I'm totally new to LuaTeX and Lua programming language. But I can learn
 necessary stuffs to get it done.

 With a quick search, I saw linebreak_filter callback in LuaTeX reference.
 Is that relevant to the problem? Or using external filter is already
 acceptable?

 Regards,
 --
 Theppitak Karoonboonyanan
 http://linux.thai.net/~thep/


I Hope  that someone can help here

-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] Support for Thai in ConTeXt

2013-05-14 Thread Hans Hagen

On 5/14/2013 6:07 PM, luigi scarso wrote:


I Hope  that someone can help here


as Mojca mentioned thai at bachotex i'll add the patterns as a start

given specs, examples and time, adding support for thai to context 
shouldn't be too hard (assuming that there are users)


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] bug with numeral conversion function

2008-09-02 Thread Khaled Hosny
On Mon, Sep 01, 2008 at 07:26:43PM +0200, Khaled Hosny wrote:
 Now, I think I discovered another bug (or feature?), the function will
 ignore any zeros at the left which isn't what one expects.

This happen to be some thing in Lua itself:
s = 000123 print(s)
will give 123, so it have to be a string to keep the to the left zeros.

I rewrote the converters.alphabetic() and converters.Alphabetic() so
that it will not expect a number and will just iterate through the given
string, and now \arabicnumerals and its brothers will pass strings to
it.

Also I found that \abjadnumerals were referring to
converters.arabicnumerals witch doesn't exist, I changed it to 
converters.abjadnumerals but there is no converters.abjadnaivenumerals
and I've no idea what it is supposed to do.

See the attached patch and tell me what you think.


-- 
 Khaled Hosny
 Arabic localizer and member of Arabeyes.org team
diff -Naur cont-tmf/tex/context/base/core-con.lua cont-tmf.local/tex/context/base/core-con.lua
--- cont-tmf/tex/context/base/core-con.lua	2008-06-24 23:02:50.0 +0300
+++ cont-tmf.local/tex/context/base/core-con.lua	2008-09-02 12:01:17.0 +0200
@@ -102,22 +102,22 @@
 texsprint(utfchar(n+m))
 end
 
-local function do_alphabetic(n,max,chr)
-if n  max then
-do_alphabetic(floor((n-1)/max),max,chr)
-n = (n-1)%max+1
-end
-characters.flush(chr(n))
-end
-
 function converters.alphabetic(n,code)
 local code = counters[code] or counters['**']
-do_alphabetic(n,#code,function(n) return code[n] or fallback end)
+for c in string.characters(n) do
+local c = c + 1
+	local chr = function(n) return code[n] or fallback end
+	characters.flush(chr(c))
+end
 end
 
 function converters.Alphabetic(n,code)
 local code = counters[code] or counters['**']
-do_alphabetic(n,#code,function(n) return characters.uccode(code[n] or fallback) end)
+for c in string.characters(n) do
+local c = c + 1
+	local chr = function(n) return characters.uccode(code[n] or fallback) end
+	characters.flush(chr(c))
+end
 end
 
 function converters.character(n)  converters.chr (n,96) end
diff -Naur cont-tmf/tex/context/base/core-con.mkiv cont-tmf.local/tex/context/base/core-con.mkiv
--- cont-tmf/tex/context/base/core-con.mkiv	2008-06-24 22:55:44.0 +0300
+++ cont-tmf.local/tex/context/base/core-con.mkiv	2008-09-02 12:17:49.0 +0200
@@ -17,8 +17,8 @@
 
 \def\romannumerals   #1{\ctxlua{converters.romannumerals(\number#1)}}
 \def\Romannumerals   #1{\ctxlua{converters.Romannumerals(\number#1)}}
-\def\abjadnumerals  #1{\ctxlua{converters.arabicnumerals(\number#1)}}
-\def\abjadnodotnumerals #1{\ctxlua{converters.arabicnodotnumerals(\number#1)}}
+\def\abjadnumerals  #1{\ctxlua{converters.abjadnumerals(\number#1)}}
+\def\abjadnodotnumerals #1{\ctxlua{converters.abjadnodotnumerals(\number#1)}}
 \def\abjadnaivenumerals #1{\ctxlua{converters.arabicnaivenumerals(\number#1)}}
 
 \defineconversion [romannumerals]  [\romannumerals]
@@ -32,8 +32,8 @@
 \def\characters#1{\ctxlua{converters.characters(\number#1)}}
 \def\Characters#1{\ctxlua{converters.Characters(\number#1)}}
 
-\def\languagecharacters#1{\ctxlua{converters.alphabetic(\number#1,\currentlanguage)}} % new
-\def\languageCharacters#1{\ctxlua{converters.Alphabetic(\number#1,\currentlanguage)}} % new
+\def\languagecharacters#1{\ctxlua{converters.alphabetic(#1,\currentlanguage)}} % new
+\def\languageCharacters#1{\ctxlua{converters.Alphabetic(#1,\currentlanguage)}} % new
 
 \def\getdayoftheweek#1#2#3{\normalweekday\ctxlua{converters.weekday(\number#1,\number#2,\number#3)}}
 \def\dayoftheweek   #1#2#3{\doconvertday{\ctxlua{converters.weekday(\number#1,\number#2,\number#3)}}}
@@ -73,19 +73,19 @@
 
 % we could use an auxiliary macro to save some bytes in the format
 %
-% \def\dolanguagecharacters#1#2{\ctxlua{converters.alphabetic(\number#2,#1)}}
+% \def\dolanguagecharacters#1#2{\ctxlua{converters.alphabetic(#2,#1)}}
 
 % this does not belong here, but in a lang-module
 
-\def\thainumerals  #1{\ctxlua{converters.alphabetic(\number#1,thai)}}
-\def\devanagarinumerals#1{\ctxlua{converters.alphabetic(\number#1,devanagari)}}
-\def\gurmurkhinumerals #1{\ctxlua{converters.alphabetic(\number#1,gurmurkhi)}}
-\def\gujaratinumerals  #1{\ctxlua{converters.alphabetic(\number#1,gujarati)}}
-\def\tibetannumerals   #1{\ctxlua{converters.alphabetic(\number#1,tibetan)}}
-\def\greeknumerals #1{\ctxlua{converters.alphabetic(\number#1,greek)}}
-\def\Greeknumerals #1{\ctxlua{converters.Alphabetic(\number#1,greek)}}
-\def\arabicnumerals#1{\ctxlua{converters.alphabetic(\number#1,arabic)}}
-\def\persiannumerals   #1{\ctxlua{converters.alphabetic(\number#1,persian)}}
+\def\thainumerals  #1{\ctxlua{converters.alphabetic(#1,thai)}}
+\def\devanagarinumerals#1{\ctxlua{converters.alphabetic(#1,devanagari)}}
+\def\gurmurkhinumerals #1{\ctxlua{converters.alphabetic(#1,gurmurkhi)}}
+\def\gujaratinumerals  #1{\ctxlua

Re: [NTG-context] Creating account on wiki contextgarden

2008-07-29 Thread luigi scarso
On Tue, Jul 29, 2008 at 7:55 PM, Hans Hagen [EMAIL PROTECTED] wrote:

 luigi scarso wrote:
  On Mon, Jul 28, 2008 at 10:28 PM, Mehdi Omidali [EMAIL PROTECTED]
 wrote:
 
  Hi everyone,
  I want to translate Context an excursion to farsi and went to
  http://wiki.contextgarden.net/ConTeXt_on_Excursion,_translations
  and tried to create an account to be able to access source files. I
  faced a problem in the anti automated account creation question
  which is something like
  (23 plus 8) times roman 'C'
  What must be inserted as the answer to such a problem. I tried
  everything but no success.
  Best Wishes.
 
 
  I must admit that I will feel confused if the question will be mixed with
  ancient maya numbers .

 well, you're an original 'roman' guy so you'll get the easy creation
 question


Better to say no, otherwise one can  argue that I'm also able with
Etruscan numerals
http://en.wikipedia.org/wiki/Etruscan_numerals

BTW, some linearity equations like
x =  - IV
can be problematic (actually {'nulla'  , 'N' } are valid solutions, but it's
an historical matter)
http://en.wikipedia.org/wiki/Roman_numerals

One can say that such questions should be avoided because there are no
reasons for non-roman people to know about roman numerals
(at least they are the same of non-maya people to know about maya numerals),
and it's generally true .

But, given that we are talking about ConTeXt and given that \romannumerals
is a ConTeXt macro,
in this particular case such questions are valid.

This open the door to similar questions (cfr core-con.lua,core-con.tex for
persian,thai etc)
and given that Unicode sooner or later will cover all kind of writing
systems of the human race,
I expect that some day some questions will be mixed with maya numerals.



-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] beta

2006-04-08 Thread Vit Zyka
Mojca Miklavec wrote:
 On 4/4/06, Taco Hoekwater wrote:
 

Hans Hagen wrote:

Hi,

 
 
- for mojca: take a look at regi-syn and let me know what vectors need
to be be added to the distribution
 
 
Mojca, it would be nice if you could give a go/nogo signal quickly.
I am slowly getting drowned with all the diff files so I am really
eager to have Hans go ahead and release a new version :)
 
 
 Taco  Hans: I'm really really really sorry. I didn't notice that
 question in thousands of mails on the list.
 
 Thanks a lot for adding the file, Hans!
 
 This line
 \defineregimesynonym[cp-1250] [cp1250]
 is not really needed: I never spotted any cp125* with a hyphen
 inbetween (in contrast to utf or iso encodings), otherwise everything
 seems to be working ok.
 
 \defineregimesynonym[1250] [cp1250]
 is also OK (didn't thought about it ;).
 
 
 If you're asking me about the other changes: here's the same list that
 I already suggested:
 
 renaming:
 
 windows - cp1252
 il1 - iso-8858-1
 latin2 - iso-8858-2
 iso88595 - iso-8858-5
 ^^
Everywhere should be 8859!

Everything else seems all right to me.
Vit

 grk - iso-8859-7

 
 
 And then adding the following definitions (cp1250 is already there):
 
 \defineregimesynonym[utf-8][utf]
 \defineregimesynonym[utf8][utf]
 
 \defineregimesynonym[windows-1250][cp1250]
 \defineregimesynonym[windows-1251][cp1251]
 \defineregimesynonym[windows-1252][cp1252]
 \defineregimesynonym[windows-1253][cp1253]
 \defineregimesynonym[windows-1254][cp1254]
 %defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew)
 %defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic)
 \defineregimesynonym[windows-1257][cp1257]
 %defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese)
 
 % for historical reasons / compatibility
 \defineregimesynonym[windows][cp1252]
 
 % 5 - Cyrillic
 % 6 - Arabic (not supported)
 % 7 - Greek
 % 8 - Hebrew (3 signs missing)
 % 11 - Thai (not supported)
 
 \defineregimesynonym[il1][iso-8859-1]
 \defineregimesynonym[il2][iso-8859-2]
 \defineregimesynonym[il3][iso-8859-3]
 \defineregimesynonym[il4][iso-8859-4]
 \defineregimesynonym[il5][iso-8859-9]
 \defineregimesynonym[il6][iso-8859-10]
 \defineregimesynonym[il7][iso-8859-13]
 %defineregimesynonym[il8][iso-8859-14]
 \defineregimesynonym[il9][iso-8859-15]
 \defineregimesynonym[il10][iso-8859-16]
 
 \defineregimesynonym[latin1][iso-8859-1]
 \defineregimesynonym[latin2][iso-8859-2]
 \defineregimesynonym[latin3][iso-8859-3]
 \defineregimesynonym[latin4][iso-8859-4]
 \defineregimesynonym[latin5][iso-8859-9]
 \defineregimesynonym[latin6][iso-8859-10]
 \defineregimesynonym[latin7][iso-8859-13]
 %defineregimesynonym[latin8][iso-8859-14]
 \defineregimesynonym[latin9][iso-8859-15]
 \defineregimesynonym[latin10][iso-8859-16]
 
 % for historical reasons / compatibility
 \defineregimesynonym[iso88595][iso-8859-5]
 \defineregimesynonym[grk][iso-8859-7]
 
 
 
 I don't know whether and how often people use all those encodings (I'm
 only pretty sure that people use the cp1250 one). LaTeX offers all of
 them for example. I would suggest at least to rename the five regimes
 mentioned above and to point to the more consistent names using
 synonyms. The mentioned regimes are all present on
 http://pub.mojca.org/tex/enco/contextbase/, so it's up to you wheter
 you add any of the other regimes to the distribution or perhaps better
 wait till someone requests them. (There are so many files that taking
 them all would almost require a separate folder.) I'm happy now that
 cp1250 is in and I'm not using any other regime, so it's really not my
 decision.
 
 As far as I remember there were also some inconsistencies in the
 present greek and cyrillic regime.
 http://pub.mojca.org/tex/enco/contextbase/regi-vis.tex is slightly
 different than the file in the distro (uses named glyphs), but
 conceptually the same.
 
 Mojca
 ___
 ntg-context mailing list
 ntg-context@ntg.nl
 http://www.ntg.nl/mailman/listinfo/ntg-context
 

-- 
===
Ing. Vít Zýka, Ph.D. TYPOkvítek

database publishing  databazove publikovani
data maintaining and typesetting in typographic quality
priprava dat a jejich sazba v typograficke kvalite

tel.: (+420) 777 198 189 www: http://typokvitek.com
===

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] beta

2006-04-07 Thread Mojca Miklavec
On 4/4/06, Taco Hoekwater wrote:


 Hans Hagen wrote:
  Hi,
 

 - for mojca: take a look at regi-syn and let me know what vectors need
 to be be added to the distribution


 Mojca, it would be nice if you could give a go/nogo signal quickly.
 I am slowly getting drowned with all the diff files so I am really
 eager to have Hans go ahead and release a new version :)

Taco  Hans: I'm really really really sorry. I didn't notice that
question in thousands of mails on the list.

Thanks a lot for adding the file, Hans!

This line
\defineregimesynonym[cp-1250] [cp1250]
is not really needed: I never spotted any cp125* with a hyphen
inbetween (in contrast to utf or iso encodings), otherwise everything
seems to be working ok.

\defineregimesynonym[1250] [cp1250]
is also OK (didn't thought about it ;).


If you're asking me about the other changes: here's the same list that
I already suggested:

renaming:

windows - cp1252
il1 - iso-8858-1
latin2 - iso-8858-2
iso88595 - iso-8858-5
grk - iso-8859-7


And then adding the following definitions (cp1250 is already there):

\defineregimesynonym[utf-8][utf]
\defineregimesynonym[utf8][utf]

\defineregimesynonym[windows-1250][cp1250]
\defineregimesynonym[windows-1251][cp1251]
\defineregimesynonym[windows-1252][cp1252]
\defineregimesynonym[windows-1253][cp1253]
\defineregimesynonym[windows-1254][cp1254]
%defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew)
%defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic)
\defineregimesynonym[windows-1257][cp1257]
%defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese)

% for historical reasons / compatibility
\defineregimesynonym[windows][cp1252]

% 5 - Cyrillic
% 6 - Arabic (not supported)
% 7 - Greek
% 8 - Hebrew (3 signs missing)
% 11 - Thai (not supported)

\defineregimesynonym[il1][iso-8859-1]
\defineregimesynonym[il2][iso-8859-2]
\defineregimesynonym[il3][iso-8859-3]
\defineregimesynonym[il4][iso-8859-4]
\defineregimesynonym[il5][iso-8859-9]
\defineregimesynonym[il6][iso-8859-10]
\defineregimesynonym[il7][iso-8859-13]
%defineregimesynonym[il8][iso-8859-14]
\defineregimesynonym[il9][iso-8859-15]
\defineregimesynonym[il10][iso-8859-16]

\defineregimesynonym[latin1][iso-8859-1]
\defineregimesynonym[latin2][iso-8859-2]
\defineregimesynonym[latin3][iso-8859-3]
\defineregimesynonym[latin4][iso-8859-4]
\defineregimesynonym[latin5][iso-8859-9]
\defineregimesynonym[latin6][iso-8859-10]
\defineregimesynonym[latin7][iso-8859-13]
%defineregimesynonym[latin8][iso-8859-14]
\defineregimesynonym[latin9][iso-8859-15]
\defineregimesynonym[latin10][iso-8859-16]

% for historical reasons / compatibility
\defineregimesynonym[iso88595][iso-8859-5]
\defineregimesynonym[grk][iso-8859-7]



I don't know whether and how often people use all those encodings (I'm
only pretty sure that people use the cp1250 one). LaTeX offers all of
them for example. I would suggest at least to rename the five regimes
mentioned above and to point to the more consistent names using
synonyms. The mentioned regimes are all present on
http://pub.mojca.org/tex/enco/contextbase/, so it's up to you wheter
you add any of the other regimes to the distribution or perhaps better
wait till someone requests them. (There are so many files that taking
them all would almost require a separate folder.) I'm happy now that
cp1250 is in and I'm not using any other regime, so it's really not my
decision.

As far as I remember there were also some inconsistencies in the
present greek and cyrillic regime.
http://pub.mojca.org/tex/enco/contextbase/regi-vis.tex is slightly
different than the file in the distro (uses named glyphs), but
conceptually the same.

Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Character names (was: Context 2005.12.19 released)

2005-12-22 Thread Mojca Miklavec
Taco Hoekwater wrote:

 Here's what I can come up with. At least a few are acceptable, like the
 horizontal bar. \textnumero exists, but is only reachable in cyrillic
 encodings (fixable, I guess?), and the greek  vietnamese accents
 are also only usable in the correct encoding. I've used the \text...
 versions of the accents, but perhaps the actual commands are more
 correct (like \' and \~).

 Cheers, Taco

 \starttext
 \definecharacter texthorizontalbar {{--\kern 0pt--}}
 \definecharacter textdong  {\underbar{\dstroke}}

Thanks for those ...

 \NC 0300 COMBINING GRAVE ACCENT \NC \textgrave   \NC \NR
 \NC 0309 COMBINING HOOK ABOVE   \NC \texthookabove   \NC \NR
 \NC 0303 COMBINING TILDE\NC \texttilde   \NC \NR
 \NC 0301 COMBINING ACUTE ACCENT \NC \textacute   \NC \NR
 \NC 0323 COMBINING DOT BELOW\NC \textbottomdot   \NC \NR

I may be wrong, but aren't those used only in combination with other
characters? I don't know if TeX (ConTeXt) can handle this (at least
not yet). When I wrote the list a couple of days ago I forgot about
that fact. If the accent would come before the charecter, this could
be replaced by \buildtextaccent..., but here there's perhaps no
solution without some additional macros. (And since the Vietnamese
seem to be satisfied with viscii and utf for now, supporting cp1258 is
not crucial.)

I double-checked the differences between the existing regimes and the
ones that were automatically produced by a script. The list of regimes
that are ripe for supporting is thus:

cp125[ 0 | *1 | *2 | 3 | 4 | 7 ]
iso-8859-[ *1 | *2 | 3 | 4 | *5 | *7 | 9 | 13 | *15 | 16 ]
*viscii (with glyph names instead of \\u\...)

(The ones marked with a star are already supported, perhaps with some
inconsistencies. Not supported: Hebrew, Arabic, Vietnamese? for cp125X
and Arabic, Thai and Celtic for iso-8859-X.)

I'll send the files (full content is already on my page), but I need
to know how to split/group them (I guess it would be a bad idea to
have one file for each encoding). Should there be one file for
iso-8859 and one for windows encodings? What about those regimes that
are already supported? I would like to move at least the regi-win
(with 8 wrong definitions anyway) to a less discriminating place,
don't know what to do with Greek and Cyrillic.

And another set of questions:
1. Can someone check for (in)consistencies for
greekupsilondiaeresis vs. greekupsilondialytika?
Looks like the same glyph named differently at different places
(functionality may break).

2. What to do with
{\cyrillicGJE}   {\'\cyrillicG} % 0403 CYRILLIC CAPITAL LETTER GJE
{\cyrillicgje}   {\'\cyrillicg} % 0453 CYRILLIC SMALL LETTER GJE
{\cyrillicKJE}   {\'\cyrillicK} % 040C CYRILLIC CAPITAL LETTER KJE
{\cyrillickje}   {\'\cyrillick} % 045C CYRILLIC SMALL LETTER KJE
{\cyrillicgheupturn} {\cyrillicgup} % 0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
Which variant is better?

Would it make sense to define
\definecharacter cyrillicGJE {\buildtextaccent\textacute\cyrillicG}
\defineaccent ' \cyrillicG {\cyrillicGJE}
and then use \cyrillicGJE consistently?

3.
PLEASE FIX:
in enco-def.tex replace \cdots by something (\dots, I suppose, but I'm not sure)
\definecharacter textellipsis {\mathematics\cdots}
(I guess this bug was the reason for changing some definitions in
regimes/encodings elsewhere.)

Should \textellipsis be used for 2026 HORIZONTAL ELLIPSIS or anything else?

4. \softhyphen, \hyphen or \- for 00AD SOFT HYPHEN?

5. Urgently: what to do with quotations (without language
discriminations if possible)?

% 201A SINGLE LOW-9 QUOTATION MARK
\quotesinglebase vs. \lowerleftsingleninequote
% 201E DOUBLE LOW-9 QUOTATION MARK
\quotedblbase vs. \lowerleftdoubleninequote
% 2018 LEFT SINGLE QUOTATION MARK
\quoteleft vs. \upperleftsinglesixquote
% 2019 RIGHT SINGLE QUOTATION MARK
\quoteright vs. \upperrightsingleninequote

% 201C LEFT DOUBLE QUOTATION MARK
\quotedblleft vs. \upperleftdoublesixquote
% 201D RIGHT DOUBLE QUOTATION MARK
\quotedblright vs. \upperrightdoubleninequote

% 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
\guilsingleleft vs. \leftsubguillemot
 % 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
\guilsingleright vs. \rightsubguillemot
% 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
\leftguillemot vs. \greekleftquot
(are Greek quotations treated specially or what is this doing in regi-grk?)
% 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
\rightguillemot vs. \greekrightquot vs. \prewordbreak\rightguillemot
(in my point of view the last one may be better, but not fair since
it's language dependent: may be OK for French, but not for German or
vice versa; perhaps a language-sensitive macro could be inserted at
this place?)

6. \textnumero, 0x2116 (and perhaps some other characters) should be
added to unicode vector 33.

7. files regi-il1 and regi-win have many inconsistencies. I would like
to suggest to do the following renamings:

windows - cp1252
il1 - iso

[NTG-context] Character names (was: Context 2005.12.19 released)

2005-12-21 Thread Mojca Miklavec
Hans Hagen wrote:
 Mojca Miklavec wrote:
 Taco Hoekwater wrote:
 
 New features since 2005.12.18:
 
 * Support for the latin-9 regime (latin-1 + euro)
 
 
 There are some more (automatically generated) regime definitions at
 http://pub.mojca.org/tex/enco/contextbase/
 (only from the glyph names that I was able to extract from the
 existing files, so it's only OK for some of the regimes mentioned
 there).
 
 If possible, I would like to ask for core support for windows-1250
 (perhaps other users may find some other regimes useful as well).
 
 
 just send me the files you feel confident with

(I'll send the good files soon.)

Except Celtic, Thai, Arabic and Hebrew (although the letter names for
Hebrew are almost completely defined) almost all the windows and ISO
regimes are OK, just some glyphs are missing (which are, or at least
were, missing in Unicode vectors as well). If anyone has suggestions
for names for the following characters, 6 additional regimes can be
fully supported:

windows-1251 and iso-8859-5
2116 NUMERO SIGN

windows-1253
0385 GREEK DIALYTIKA TONOS
2015 HORIZONTAL BAR
0384 GREEK TONOS

windows-1258
0300 COMBINING GRAVE ACCENT
0309 COMBINING HOOK ABOVE
0303 COMBINING TILDE
0301 COMBINING ACUTE ACCENT
0323 COMBINING DOT BELOW
20AB DONG SIGN

iso-8859-7
20AF DRACHMA SIGN
037A GREEK YPOGEGRAMMENI
2015 HORIZONTAL BAR
0384 GREEK TONOS
0385 GREEK DIALYTIKA TONOS

iso-8859-10
2015 HORIZONTAL BAR

Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Character names (was: Context 2005.12.19 released)

2005-12-21 Thread Taco Hoekwater


Here's what I can come up with. At least a few are acceptable, like the
horizontal bar. \textnumero exists, but is only reachable in cyrillic
encodings (fixable, I guess?), and the greek  vietnamese accents
are also only usable in the correct encoding. I've used the \text...
versions of the accents, but perhaps the actual commands are more
correct (like \' and \~).

Cheers, Taco

\starttext
\definecharacter texthorizontalbar {{--\kern 0pt--}}
\definecharacter textdong  {\underbar{\dstroke}}

\starttabulate[|c|c|]
\NC 0300 COMBINING GRAVE ACCENT \NC \textgrave   \NC \NR
\NC 0309 COMBINING HOOK ABOVE   \NC \texthookabove   \NC \NR
\NC 0303 COMBINING TILDE\NC \texttilde   \NC \NR
\NC 0301 COMBINING ACUTE ACCENT \NC \textacute   \NC \NR
\NC 0323 COMBINING DOT BELOW\NC \textbottomdot   \NC \NR
\NC 037A GREEK YPOGEGRAMMENI\NC \unknownchar \NC \NR  % prime?
\NC 0384 GREEK TONOS\NC \greektonos  \NC \NR
\NC 0385 GREEK DIALYTIKA TONOS  \NC \greekdialytikatonos \NC \NR
\NC 2015 HORIZONTAL BAR \NC \texthorizontalbar   \NC \NR
\NC 20AB DONG SIGN  \NC \textdong\NC \NR
\NC 20AF DRACHMA SIGN   \NC \unknownchar \NC \NR
\NC 2116 NUMERO SIGN\NC \textnumero  \NC \NR
\stoptabulate
\stoptext


Mojca Miklavec wrote:

Hans Hagen wrote:


Mojca Miklavec wrote:


Taco Hoekwater wrote:



New features since 2005.12.18:

* Support for the latin-9 regime (latin-1 + euro)



There are some more (automatically generated) regime definitions at
http://pub.mojca.org/tex/enco/contextbase/
(only from the glyph names that I was able to extract from the
existing files, so it's only OK for some of the regimes mentioned
there).

If possible, I would like to ask for core support for windows-1250
(perhaps other users may find some other regimes useful as well).




just send me the files you feel confident with



(I'll send the good files soon.)

Except Celtic, Thai, Arabic and Hebrew (although the letter names for
Hebrew are almost completely defined) almost all the windows and ISO
regimes are OK, just some glyphs are missing (which are, or at least
were, missing in Unicode vectors as well). If anyone has suggestions
for names for the following characters, 6 additional regimes can be
fully supported:

windows-1251 and iso-8859-5
2116 NUMERO SIGN

windows-1253
0385 GREEK DIALYTIKA TONOS
2015 HORIZONTAL BAR
0384 GREEK TONOS

windows-1258
0300 COMBINING GRAVE ACCENT
0309 COMBINING HOOK ABOVE
0303 COMBINING TILDE
0301 COMBINING ACUTE ACCENT
0323 COMBINING DOT BELOW
20AB DONG SIGN

iso-8859-7
20AF DRACHMA SIGN
037A GREEK YPOGEGRAMMENI
2015 HORIZONTAL BAR
0384 GREEK TONOS
0385 GREEK DIALYTIKA TONOS

iso-8859-10
2015 HORIZONTAL BAR

Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


[NTG-context] towards some more consistency in regimes unicode support

2005-09-13 Thread Mojca Miklavec
 
the case in ConTeXt unless something has changed recently.
There are many other letter wrongly named in Unicode (with cedilla), 
although they have a comma. I would suggest to name them 
\[gklnr]commaaccent and use \[gklnr]cedilla as a synonym (if needed at 
all for backward compatibility, otherwise it would be better to leave 
them out; there is no such letter with cedilla in unicode, if someone 
needs one, he can construct one trivially with \buildtextaccent)


7. there's a-kind-of-bug-but-not-really-one in enco-ans.tex. 
textcedilla maps to 184, which isn't defined in Antykwa for example 
(it's on place 24). It's more a bug in texnansi encoding, which has 
cedilla on two places, which is pretty stupid. But anyway:

\definecharacter textcedilla 24
would solve some problems (and hopefully not introduce new ones).

8. most letters are named
c with cedilla - ccedilla
what about the names for open o, turned e, long s, turned r with 
hook?

\openo or \oopen? \rturnedhook or \turnedrhook?

9. can latin letters and numbers be accessed somehow by name?

10. Adam prepared some dingbats support I think, this could be added here.

11. There's a showunicode pdf document on pragma-ade.com (at least I saw 
it once), but it's not listed on the overview.htm.


12. I don't know if anyone would ever need to switch from viscii regime 
to some other, but what would happen to the characters under 128 (some 
of them are redefined in viscii)? I'm affraid that there would remain 
Vietnamese leftovers in the lower part of the table.


13. If there are any other comments on the table and/or the script(s), 
please let me know.



IV. With the help of the prepared names list I processed definitions for 
regimes (taken from Unicode webpage) for ISO-8859-* and cp125* (others 
should be trivial). They are only preliminary, some (Hebrew, Thai, 
Arabic) probably don't make any sense yet, but could the rest be added 
to ConTeXt after someone checks if everything is OK? (iso88595, cp1251, 
il1, il2, il9, windows and viscii regimes already exist and should be 
compared for differences)
If possible in such a way that it wouldn't be necessary to include the 
regime definition file manually, but similarly as \usemodule[pre-polish] 
finds and processes the proper file, the \enableregime[xxx] should find 
the proper file and load it.


(And for those who made it till here - sorry again for that gigantic mail.)
Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


[NTG-context] Regimes to be supported; Comments?

2005-07-29 Thread Mojca Miklavec

Hello,

Some time ago there was a discussion about extending support for 
different regimes in ConTeXt. The list of (to-be-)supported regimes 
probably depends strongly on the implementation (ruby+iconv?). I 
collected a preliminary list of candidate regimes and possible synonyms 
(some synonyms are listed there for backward compatibility and have to 
remain there), leaving out most of eastern encodings (not because they 
shouldn't be on the list, but because I'm completely ignorant about that).


Hans suggested to post this to the mailing list first to get some useful 
comments and suggestions.


#

The following question should probably go in a separate thread, but it's 
a very similar thematic. In July 2006 Ljubljana will host people from 
around 85 coutries of the world. One of the very ambitious organizers is 
dreaming for already a couple of years to print the participant names 
(on honourable mentions for example, ...) in both latinic transcription 
and as they are written in original (under an assumption that the names 
are properly entered in a UTF-8 database). This is probably not possible 
to do for every single obscure language, but does it in general sound like:

a) Good luck (I don't want to be on your place)!
b) Take a good (commercial) program
c) If you're ready to invest the rest of your time (forget about 
hobbies!), it's probably doable in LaTeX or ConTeXt until then
č) Forget about TeX - it will be possible to solve this problem one day 
with unicode  one of the new TeX engines. But until then, it's not 
worth the effort, because any effort you may invest will become obsolete 
in a couple of years.


To be honest, even some people who will thanslate the materials into the 
native language, will probably do that with paper, pencil  scanner.


#


Mojca

And here the encodings:

# ISO
ISO-8859-1  Western
ISO-8859-2  Central European
ISO-8859-3  South European
ISO-8859-4  Baltic
ISO-8859-5  Cyrillic
ISO-8859-6  Arabic
ISO-8859-7  Greek
ISO-8859-8  Hebrew Visual
ISO-8859-8-I Hebrew (???) What is that?
ISO-8859-9  Turkish
ISO-8859-10 Nordic
ISO-8859-11 Thai
ISO-8859-13 Baltic
ISO-8859-14 Celtic
ISO-8859-15 Western
ISO-8859-16 Romanian

\defineregimesynonym[il*][iso-8859-*], *=1-16\12
\defineregimesynonym[latin*][iso-8859-*], *=1-16\12
\defineregimesynonym[cp819][iso-8859-1]

% I'm not sure that anyone needs these:
\defineregimesynonym[iso-ir-100][iso-8859-1]
\defineregimesynonym[iso-ir-101][iso-8859-2]
\defineregimesynonym[iso-ir-109][iso-8859-3]
\defineregimesynonym[iso-ir-110][iso-8859-4]
\defineregimesynonym[iso-ir-144][iso-8859-5]
\defineregimesynonym[iso-ir-127][iso-8859-6]
\defineregimesynonym[iso-ir-126][iso-8859-7]
\defineregimesynonym[iso-ir-138][iso-8859-8]
\defineregimesynonym[iso-ir-148][iso-8859-9]
\defineregimesynonym[iso-ir-157][iso-8859-10]
\defineregimesynonym[iso-ir-179][iso-8859-13]
\defineregimesynonym[iso-ir-199][iso-8859-14]
\defineregimesynonym[iso-ir-203][iso-8859-15]
\defineregimesynonym[iso-ir-226][iso-8859-16]

% backward compatibility
\defineregimesynonym[iso88595][iso-8859-5]

(recode also recognises arabic, greek, cyrillic, hebrew as 
an alias for those encodings: I don't if this is a good idea as there 
are other charset operating with the same language groups as well)


# APPLE
MacArabic
MacCeltic
MacCentralEuropean
% CentEur, CentralEurope or CentralEuropean? or all of them?
MacChineseSimplified
MacChineseTraditional
MacCroatian
MacCyrillic
MacDevanagari
MacDingbats
MacFarsi
MacGaelic
MacGreek
MacGujarati
MacGurmukhi
MacHebrew
MacIcelandic
MacInuit
MacJapanese
MacKeyboard
MacKorean
MacRoman
MacRomanian
MacSymbol
MacThai
MacTurkish
MacUkrainian

\defineregimesynonym[MacCE][MacCentralEuropean]
\defineregimesynonym[mac][MacRoman]
\defineregimesynonym[maccyr][MacCyrillic]
\defineregimesynonym[macukr][MacUkrainian]

(I also need some help here: sometimes Mac encodings are defined using 
adjectives, sometimes using nouns, like Ukraine/Ukrainian. Should only 
one of them (which?) be used or both of them? On the unicode page, Mac 
encodings appear twice. The second time under Microsoft/Apple, 
containing MacCyrillic, MacGreek, MacIceland, MacLatin2, MacRoman, 
MacTurkish. I didn't really get the point for that.)


# IBM
% essentially the same as under Microsoft, with some minor changes 
(to be processed manually, if these are to be supported)

# MICROSOFT
EBCDIC % plenty of them are missing on the web
cp037
cp500
cp875
cp1026
PC
cp437 LatinUS
cp737 Greek
cp775 BaltRim
cp850 Latin1
cp852 Latin2
cp855 Cyrillic
cp857 Turkish
cp860 Portuguese
cp861 Icelandic
cp862 Hebrew