Re: [NTG-context] Support for Thai in ConTeXt
On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote: On 5/14/2013 6:07 PM, luigi scarso wrote: I Hope that someone can help here as Mojca mentioned thai at bachotex i'll add the patterns as a start given specs, examples and time, adding support for thai to context shouldn't be too hard (assuming that there are users) But it's not trivial either. There's an opensource project implementing word segmentation: http://linux.thai.net/projects/swath The specification (someone's thesis) can be found here: http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf The ugly part of pdfTeX approach is that it requires an external text processor to digest an input TeX document and return a copy with word segmentation. Then pdfTeX is run on the resulting file. XeTeX can use ICU library to do the segmentation. In LuaTeX one would have to plug the word segmentation somewhere (but writing that part is slightly non-trivial). Mojca ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Support for Thai in ConTeXt
On 5/15/2013 4:09 PM, Mojca Miklavec wrote: On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote: On 5/14/2013 6:07 PM, luigi scarso wrote: I Hope that someone can help here as Mojca mentioned thai at bachotex i'll add the patterns as a start given specs, examples and time, adding support for thai to context shouldn't be too hard (assuming that there are users) But it's not trivial either. It depends ... we're using a dictionary to determine word boundaries, aren't we? I'm pretty sure that I've done more complex coding. There's an opensource project implementing word segmentation: http://linux.thai.net/projects/swath The specification (someone's thesis) can be found here: http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf Ok, so there are some ttext files there with words. The ugly part of pdfTeX approach is that it requires an external text processor to digest an input TeX document and return a copy with word segmentation. Then pdfTeX is run on the resulting file. XeTeX can use ICU library to do the segmentation. In LuaTeX one would have to plug the word segmentation somewhere (but writing that part is slightly non-trivial). I just did a quick test using those dictionaries (abusing some code that i already had on my machine). Quite doable. It all depends on having the dictionaries available (on the garden or in the distribution). Anyhow, it's not that much font related, just language / script support and we already have that for some languages and adding thai to it doesn't hurt. Of course we'd need some testing. It doesn't make much sense to add features to context that no one would use at some point. But ... Luigi is already teaching himself Thai, so ... Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Support for Thai in ConTeXt
On Wed, May 15, 2013 at 5:20 PM, Hans Hagen pra...@wxs.nl wrote: But ... Luigi is already teaching himself Thai, so ... no no, just connecting people on different ml. Currently I'm in a completely different area -- luigi ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
[NTG-context] Support for Thai in ConTeXt
On Tue, May 14, 2013 at 5:59 PM, Theppitak Karoonboonyanan theppi...@gmail.com wrote: On Tue, May 14, 2013 at 9:58 PM, luigi scarso luigi.sca...@gmail.com wrote: On Tue, May 14, 2013 at 4:16 PM, Mojca Miklavec mojca.miklavec.li...@gmail.com wrote: I could also ask differently: suppose that a motivated Thai programmer would be willing to work on solving the problem properly. What would be the suggested solution? You can post also in the context ml, maybe there is some Thai user there . I am a Thai developer who works on Thai word segmentation tools and thailatex package. So, you can suggest to me. (Please Cc: me, I'm not in the mailing list.) I'm totally new to LuaTeX and Lua programming language. But I can learn necessary stuffs to get it done. With a quick search, I saw linebreak_filter callback in LuaTeX reference. Is that relevant to the problem? Or using external filter is already acceptable? Regards, -- Theppitak Karoonboonyanan http://linux.thai.net/~thep/ I Hope that someone can help here -- luigi ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Support for Thai in ConTeXt
On 5/14/2013 6:07 PM, luigi scarso wrote: I Hope that someone can help here as Mojca mentioned thai at bachotex i'll add the patterns as a start given specs, examples and time, adding support for thai to context shouldn't be too hard (assuming that there are users) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] bug with numeral conversion function
On Mon, Sep 01, 2008 at 07:26:43PM +0200, Khaled Hosny wrote: Now, I think I discovered another bug (or feature?), the function will ignore any zeros at the left which isn't what one expects. This happen to be some thing in Lua itself: s = 000123 print(s) will give 123, so it have to be a string to keep the to the left zeros. I rewrote the converters.alphabetic() and converters.Alphabetic() so that it will not expect a number and will just iterate through the given string, and now \arabicnumerals and its brothers will pass strings to it. Also I found that \abjadnumerals were referring to converters.arabicnumerals witch doesn't exist, I changed it to converters.abjadnumerals but there is no converters.abjadnaivenumerals and I've no idea what it is supposed to do. See the attached patch and tell me what you think. -- Khaled Hosny Arabic localizer and member of Arabeyes.org team diff -Naur cont-tmf/tex/context/base/core-con.lua cont-tmf.local/tex/context/base/core-con.lua --- cont-tmf/tex/context/base/core-con.lua 2008-06-24 23:02:50.0 +0300 +++ cont-tmf.local/tex/context/base/core-con.lua 2008-09-02 12:01:17.0 +0200 @@ -102,22 +102,22 @@ texsprint(utfchar(n+m)) end -local function do_alphabetic(n,max,chr) -if n max then -do_alphabetic(floor((n-1)/max),max,chr) -n = (n-1)%max+1 -end -characters.flush(chr(n)) -end - function converters.alphabetic(n,code) local code = counters[code] or counters['**'] -do_alphabetic(n,#code,function(n) return code[n] or fallback end) +for c in string.characters(n) do +local c = c + 1 + local chr = function(n) return code[n] or fallback end + characters.flush(chr(c)) +end end function converters.Alphabetic(n,code) local code = counters[code] or counters['**'] -do_alphabetic(n,#code,function(n) return characters.uccode(code[n] or fallback) end) +for c in string.characters(n) do +local c = c + 1 + local chr = function(n) return characters.uccode(code[n] or fallback) end + characters.flush(chr(c)) +end end function converters.character(n) converters.chr (n,96) end diff -Naur cont-tmf/tex/context/base/core-con.mkiv cont-tmf.local/tex/context/base/core-con.mkiv --- cont-tmf/tex/context/base/core-con.mkiv 2008-06-24 22:55:44.0 +0300 +++ cont-tmf.local/tex/context/base/core-con.mkiv 2008-09-02 12:17:49.0 +0200 @@ -17,8 +17,8 @@ \def\romannumerals #1{\ctxlua{converters.romannumerals(\number#1)}} \def\Romannumerals #1{\ctxlua{converters.Romannumerals(\number#1)}} -\def\abjadnumerals #1{\ctxlua{converters.arabicnumerals(\number#1)}} -\def\abjadnodotnumerals #1{\ctxlua{converters.arabicnodotnumerals(\number#1)}} +\def\abjadnumerals #1{\ctxlua{converters.abjadnumerals(\number#1)}} +\def\abjadnodotnumerals #1{\ctxlua{converters.abjadnodotnumerals(\number#1)}} \def\abjadnaivenumerals #1{\ctxlua{converters.arabicnaivenumerals(\number#1)}} \defineconversion [romannumerals] [\romannumerals] @@ -32,8 +32,8 @@ \def\characters#1{\ctxlua{converters.characters(\number#1)}} \def\Characters#1{\ctxlua{converters.Characters(\number#1)}} -\def\languagecharacters#1{\ctxlua{converters.alphabetic(\number#1,\currentlanguage)}} % new -\def\languageCharacters#1{\ctxlua{converters.Alphabetic(\number#1,\currentlanguage)}} % new +\def\languagecharacters#1{\ctxlua{converters.alphabetic(#1,\currentlanguage)}} % new +\def\languageCharacters#1{\ctxlua{converters.Alphabetic(#1,\currentlanguage)}} % new \def\getdayoftheweek#1#2#3{\normalweekday\ctxlua{converters.weekday(\number#1,\number#2,\number#3)}} \def\dayoftheweek #1#2#3{\doconvertday{\ctxlua{converters.weekday(\number#1,\number#2,\number#3)}}} @@ -73,19 +73,19 @@ % we could use an auxiliary macro to save some bytes in the format % -% \def\dolanguagecharacters#1#2{\ctxlua{converters.alphabetic(\number#2,#1)}} +% \def\dolanguagecharacters#1#2{\ctxlua{converters.alphabetic(#2,#1)}} % this does not belong here, but in a lang-module -\def\thainumerals #1{\ctxlua{converters.alphabetic(\number#1,thai)}} -\def\devanagarinumerals#1{\ctxlua{converters.alphabetic(\number#1,devanagari)}} -\def\gurmurkhinumerals #1{\ctxlua{converters.alphabetic(\number#1,gurmurkhi)}} -\def\gujaratinumerals #1{\ctxlua{converters.alphabetic(\number#1,gujarati)}} -\def\tibetannumerals #1{\ctxlua{converters.alphabetic(\number#1,tibetan)}} -\def\greeknumerals #1{\ctxlua{converters.alphabetic(\number#1,greek)}} -\def\Greeknumerals #1{\ctxlua{converters.Alphabetic(\number#1,greek)}} -\def\arabicnumerals#1{\ctxlua{converters.alphabetic(\number#1,arabic)}} -\def\persiannumerals #1{\ctxlua{converters.alphabetic(\number#1,persian)}} +\def\thainumerals #1{\ctxlua{converters.alphabetic(#1,thai)}} +\def\devanagarinumerals#1{\ctxlua{converters.alphabetic(#1,devanagari)}} +\def\gurmurkhinumerals #1{\ctxlua{converters.alphabetic(#1,gurmurkhi)}} +\def\gujaratinumerals #1{\ctxlua
Re: [NTG-context] Creating account on wiki contextgarden
On Tue, Jul 29, 2008 at 7:55 PM, Hans Hagen [EMAIL PROTECTED] wrote: luigi scarso wrote: On Mon, Jul 28, 2008 at 10:28 PM, Mehdi Omidali [EMAIL PROTECTED] wrote: Hi everyone, I want to translate Context an excursion to farsi and went to http://wiki.contextgarden.net/ConTeXt_on_Excursion,_translations and tried to create an account to be able to access source files. I faced a problem in the anti automated account creation question which is something like (23 plus 8) times roman 'C' What must be inserted as the answer to such a problem. I tried everything but no success. Best Wishes. I must admit that I will feel confused if the question will be mixed with ancient maya numbers . well, you're an original 'roman' guy so you'll get the easy creation question Better to say no, otherwise one can argue that I'm also able with Etruscan numerals http://en.wikipedia.org/wiki/Etruscan_numerals BTW, some linearity equations like x = - IV can be problematic (actually {'nulla' , 'N' } are valid solutions, but it's an historical matter) http://en.wikipedia.org/wiki/Roman_numerals One can say that such questions should be avoided because there are no reasons for non-roman people to know about roman numerals (at least they are the same of non-maya people to know about maya numerals), and it's generally true . But, given that we are talking about ConTeXt and given that \romannumerals is a ConTeXt macro, in this particular case such questions are valid. This open the door to similar questions (cfr core-con.lua,core-con.tex for persian,thai etc) and given that Unicode sooner or later will cover all kind of writing systems of the human race, I expect that some day some questions will be mixed with maya numerals. -- luigi ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] beta
Mojca Miklavec wrote: On 4/4/06, Taco Hoekwater wrote: Hans Hagen wrote: Hi, - for mojca: take a look at regi-syn and let me know what vectors need to be be added to the distribution Mojca, it would be nice if you could give a go/nogo signal quickly. I am slowly getting drowned with all the diff files so I am really eager to have Hans go ahead and release a new version :) Taco Hans: I'm really really really sorry. I didn't notice that question in thousands of mails on the list. Thanks a lot for adding the file, Hans! This line \defineregimesynonym[cp-1250] [cp1250] is not really needed: I never spotted any cp125* with a hyphen inbetween (in contrast to utf or iso encodings), otherwise everything seems to be working ok. \defineregimesynonym[1250] [cp1250] is also OK (didn't thought about it ;). If you're asking me about the other changes: here's the same list that I already suggested: renaming: windows - cp1252 il1 - iso-8858-1 latin2 - iso-8858-2 iso88595 - iso-8858-5 ^^ Everywhere should be 8859! Everything else seems all right to me. Vit grk - iso-8859-7 And then adding the following definitions (cp1250 is already there): \defineregimesynonym[utf-8][utf] \defineregimesynonym[utf8][utf] \defineregimesynonym[windows-1250][cp1250] \defineregimesynonym[windows-1251][cp1251] \defineregimesynonym[windows-1252][cp1252] \defineregimesynonym[windows-1253][cp1253] \defineregimesynonym[windows-1254][cp1254] %defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew) %defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic) \defineregimesynonym[windows-1257][cp1257] %defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese) % for historical reasons / compatibility \defineregimesynonym[windows][cp1252] % 5 - Cyrillic % 6 - Arabic (not supported) % 7 - Greek % 8 - Hebrew (3 signs missing) % 11 - Thai (not supported) \defineregimesynonym[il1][iso-8859-1] \defineregimesynonym[il2][iso-8859-2] \defineregimesynonym[il3][iso-8859-3] \defineregimesynonym[il4][iso-8859-4] \defineregimesynonym[il5][iso-8859-9] \defineregimesynonym[il6][iso-8859-10] \defineregimesynonym[il7][iso-8859-13] %defineregimesynonym[il8][iso-8859-14] \defineregimesynonym[il9][iso-8859-15] \defineregimesynonym[il10][iso-8859-16] \defineregimesynonym[latin1][iso-8859-1] \defineregimesynonym[latin2][iso-8859-2] \defineregimesynonym[latin3][iso-8859-3] \defineregimesynonym[latin4][iso-8859-4] \defineregimesynonym[latin5][iso-8859-9] \defineregimesynonym[latin6][iso-8859-10] \defineregimesynonym[latin7][iso-8859-13] %defineregimesynonym[latin8][iso-8859-14] \defineregimesynonym[latin9][iso-8859-15] \defineregimesynonym[latin10][iso-8859-16] % for historical reasons / compatibility \defineregimesynonym[iso88595][iso-8859-5] \defineregimesynonym[grk][iso-8859-7] I don't know whether and how often people use all those encodings (I'm only pretty sure that people use the cp1250 one). LaTeX offers all of them for example. I would suggest at least to rename the five regimes mentioned above and to point to the more consistent names using synonyms. The mentioned regimes are all present on http://pub.mojca.org/tex/enco/contextbase/, so it's up to you wheter you add any of the other regimes to the distribution or perhaps better wait till someone requests them. (There are so many files that taking them all would almost require a separate folder.) I'm happy now that cp1250 is in and I'm not using any other regime, so it's really not my decision. As far as I remember there were also some inconsistencies in the present greek and cyrillic regime. http://pub.mojca.org/tex/enco/contextbase/regi-vis.tex is slightly different than the file in the distro (uses named glyphs), but conceptually the same. Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context -- === Ing. Vít Zýka, Ph.D. TYPOkvítek database publishing databazove publikovani data maintaining and typesetting in typographic quality priprava dat a jejich sazba v typograficke kvalite tel.: (+420) 777 198 189 www: http://typokvitek.com === ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] beta
On 4/4/06, Taco Hoekwater wrote: Hans Hagen wrote: Hi, - for mojca: take a look at regi-syn and let me know what vectors need to be be added to the distribution Mojca, it would be nice if you could give a go/nogo signal quickly. I am slowly getting drowned with all the diff files so I am really eager to have Hans go ahead and release a new version :) Taco Hans: I'm really really really sorry. I didn't notice that question in thousands of mails on the list. Thanks a lot for adding the file, Hans! This line \defineregimesynonym[cp-1250] [cp1250] is not really needed: I never spotted any cp125* with a hyphen inbetween (in contrast to utf or iso encodings), otherwise everything seems to be working ok. \defineregimesynonym[1250] [cp1250] is also OK (didn't thought about it ;). If you're asking me about the other changes: here's the same list that I already suggested: renaming: windows - cp1252 il1 - iso-8858-1 latin2 - iso-8858-2 iso88595 - iso-8858-5 grk - iso-8859-7 And then adding the following definitions (cp1250 is already there): \defineregimesynonym[utf-8][utf] \defineregimesynonym[utf8][utf] \defineregimesynonym[windows-1250][cp1250] \defineregimesynonym[windows-1251][cp1251] \defineregimesynonym[windows-1252][cp1252] \defineregimesynonym[windows-1253][cp1253] \defineregimesynonym[windows-1254][cp1254] %defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew) %defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic) \defineregimesynonym[windows-1257][cp1257] %defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese) % for historical reasons / compatibility \defineregimesynonym[windows][cp1252] % 5 - Cyrillic % 6 - Arabic (not supported) % 7 - Greek % 8 - Hebrew (3 signs missing) % 11 - Thai (not supported) \defineregimesynonym[il1][iso-8859-1] \defineregimesynonym[il2][iso-8859-2] \defineregimesynonym[il3][iso-8859-3] \defineregimesynonym[il4][iso-8859-4] \defineregimesynonym[il5][iso-8859-9] \defineregimesynonym[il6][iso-8859-10] \defineregimesynonym[il7][iso-8859-13] %defineregimesynonym[il8][iso-8859-14] \defineregimesynonym[il9][iso-8859-15] \defineregimesynonym[il10][iso-8859-16] \defineregimesynonym[latin1][iso-8859-1] \defineregimesynonym[latin2][iso-8859-2] \defineregimesynonym[latin3][iso-8859-3] \defineregimesynonym[latin4][iso-8859-4] \defineregimesynonym[latin5][iso-8859-9] \defineregimesynonym[latin6][iso-8859-10] \defineregimesynonym[latin7][iso-8859-13] %defineregimesynonym[latin8][iso-8859-14] \defineregimesynonym[latin9][iso-8859-15] \defineregimesynonym[latin10][iso-8859-16] % for historical reasons / compatibility \defineregimesynonym[iso88595][iso-8859-5] \defineregimesynonym[grk][iso-8859-7] I don't know whether and how often people use all those encodings (I'm only pretty sure that people use the cp1250 one). LaTeX offers all of them for example. I would suggest at least to rename the five regimes mentioned above and to point to the more consistent names using synonyms. The mentioned regimes are all present on http://pub.mojca.org/tex/enco/contextbase/, so it's up to you wheter you add any of the other regimes to the distribution or perhaps better wait till someone requests them. (There are so many files that taking them all would almost require a separate folder.) I'm happy now that cp1250 is in and I'm not using any other regime, so it's really not my decision. As far as I remember there were also some inconsistencies in the present greek and cyrillic regime. http://pub.mojca.org/tex/enco/contextbase/regi-vis.tex is slightly different than the file in the distro (uses named glyphs), but conceptually the same. Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Character names (was: Context 2005.12.19 released)
Taco Hoekwater wrote: Here's what I can come up with. At least a few are acceptable, like the horizontal bar. \textnumero exists, but is only reachable in cyrillic encodings (fixable, I guess?), and the greek vietnamese accents are also only usable in the correct encoding. I've used the \text... versions of the accents, but perhaps the actual commands are more correct (like \' and \~). Cheers, Taco \starttext \definecharacter texthorizontalbar {{--\kern 0pt--}} \definecharacter textdong {\underbar{\dstroke}} Thanks for those ... \NC 0300 COMBINING GRAVE ACCENT \NC \textgrave \NC \NR \NC 0309 COMBINING HOOK ABOVE \NC \texthookabove \NC \NR \NC 0303 COMBINING TILDE\NC \texttilde \NC \NR \NC 0301 COMBINING ACUTE ACCENT \NC \textacute \NC \NR \NC 0323 COMBINING DOT BELOW\NC \textbottomdot \NC \NR I may be wrong, but aren't those used only in combination with other characters? I don't know if TeX (ConTeXt) can handle this (at least not yet). When I wrote the list a couple of days ago I forgot about that fact. If the accent would come before the charecter, this could be replaced by \buildtextaccent..., but here there's perhaps no solution without some additional macros. (And since the Vietnamese seem to be satisfied with viscii and utf for now, supporting cp1258 is not crucial.) I double-checked the differences between the existing regimes and the ones that were automatically produced by a script. The list of regimes that are ripe for supporting is thus: cp125[ 0 | *1 | *2 | 3 | 4 | 7 ] iso-8859-[ *1 | *2 | 3 | 4 | *5 | *7 | 9 | 13 | *15 | 16 ] *viscii (with glyph names instead of \\u\...) (The ones marked with a star are already supported, perhaps with some inconsistencies. Not supported: Hebrew, Arabic, Vietnamese? for cp125X and Arabic, Thai and Celtic for iso-8859-X.) I'll send the files (full content is already on my page), but I need to know how to split/group them (I guess it would be a bad idea to have one file for each encoding). Should there be one file for iso-8859 and one for windows encodings? What about those regimes that are already supported? I would like to move at least the regi-win (with 8 wrong definitions anyway) to a less discriminating place, don't know what to do with Greek and Cyrillic. And another set of questions: 1. Can someone check for (in)consistencies for greekupsilondiaeresis vs. greekupsilondialytika? Looks like the same glyph named differently at different places (functionality may break). 2. What to do with {\cyrillicGJE} {\'\cyrillicG} % 0403 CYRILLIC CAPITAL LETTER GJE {\cyrillicgje} {\'\cyrillicg} % 0453 CYRILLIC SMALL LETTER GJE {\cyrillicKJE} {\'\cyrillicK} % 040C CYRILLIC CAPITAL LETTER KJE {\cyrillickje} {\'\cyrillick} % 045C CYRILLIC SMALL LETTER KJE {\cyrillicgheupturn} {\cyrillicgup} % 0491 CYRILLIC SMALL LETTER GHE WITH UPTURN Which variant is better? Would it make sense to define \definecharacter cyrillicGJE {\buildtextaccent\textacute\cyrillicG} \defineaccent ' \cyrillicG {\cyrillicGJE} and then use \cyrillicGJE consistently? 3. PLEASE FIX: in enco-def.tex replace \cdots by something (\dots, I suppose, but I'm not sure) \definecharacter textellipsis {\mathematics\cdots} (I guess this bug was the reason for changing some definitions in regimes/encodings elsewhere.) Should \textellipsis be used for 2026 HORIZONTAL ELLIPSIS or anything else? 4. \softhyphen, \hyphen or \- for 00AD SOFT HYPHEN? 5. Urgently: what to do with quotations (without language discriminations if possible)? % 201A SINGLE LOW-9 QUOTATION MARK \quotesinglebase vs. \lowerleftsingleninequote % 201E DOUBLE LOW-9 QUOTATION MARK \quotedblbase vs. \lowerleftdoubleninequote % 2018 LEFT SINGLE QUOTATION MARK \quoteleft vs. \upperleftsinglesixquote % 2019 RIGHT SINGLE QUOTATION MARK \quoteright vs. \upperrightsingleninequote % 201C LEFT DOUBLE QUOTATION MARK \quotedblleft vs. \upperleftdoublesixquote % 201D RIGHT DOUBLE QUOTATION MARK \quotedblright vs. \upperrightdoubleninequote % 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK \guilsingleleft vs. \leftsubguillemot % 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK \guilsingleright vs. \rightsubguillemot % 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK \leftguillemot vs. \greekleftquot (are Greek quotations treated specially or what is this doing in regi-grk?) % 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK \rightguillemot vs. \greekrightquot vs. \prewordbreak\rightguillemot (in my point of view the last one may be better, but not fair since it's language dependent: may be OK for French, but not for German or vice versa; perhaps a language-sensitive macro could be inserted at this place?) 6. \textnumero, 0x2116 (and perhaps some other characters) should be added to unicode vector 33. 7. files regi-il1 and regi-win have many inconsistencies. I would like to suggest to do the following renamings: windows - cp1252 il1 - iso
[NTG-context] Character names (was: Context 2005.12.19 released)
Hans Hagen wrote: Mojca Miklavec wrote: Taco Hoekwater wrote: New features since 2005.12.18: * Support for the latin-9 regime (latin-1 + euro) There are some more (automatically generated) regime definitions at http://pub.mojca.org/tex/enco/contextbase/ (only from the glyph names that I was able to extract from the existing files, so it's only OK for some of the regimes mentioned there). If possible, I would like to ask for core support for windows-1250 (perhaps other users may find some other regimes useful as well). just send me the files you feel confident with (I'll send the good files soon.) Except Celtic, Thai, Arabic and Hebrew (although the letter names for Hebrew are almost completely defined) almost all the windows and ISO regimes are OK, just some glyphs are missing (which are, or at least were, missing in Unicode vectors as well). If anyone has suggestions for names for the following characters, 6 additional regimes can be fully supported: windows-1251 and iso-8859-5 2116 NUMERO SIGN windows-1253 0385 GREEK DIALYTIKA TONOS 2015 HORIZONTAL BAR 0384 GREEK TONOS windows-1258 0300 COMBINING GRAVE ACCENT 0309 COMBINING HOOK ABOVE 0303 COMBINING TILDE 0301 COMBINING ACUTE ACCENT 0323 COMBINING DOT BELOW 20AB DONG SIGN iso-8859-7 20AF DRACHMA SIGN 037A GREEK YPOGEGRAMMENI 2015 HORIZONTAL BAR 0384 GREEK TONOS 0385 GREEK DIALYTIKA TONOS iso-8859-10 2015 HORIZONTAL BAR Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Character names (was: Context 2005.12.19 released)
Here's what I can come up with. At least a few are acceptable, like the horizontal bar. \textnumero exists, but is only reachable in cyrillic encodings (fixable, I guess?), and the greek vietnamese accents are also only usable in the correct encoding. I've used the \text... versions of the accents, but perhaps the actual commands are more correct (like \' and \~). Cheers, Taco \starttext \definecharacter texthorizontalbar {{--\kern 0pt--}} \definecharacter textdong {\underbar{\dstroke}} \starttabulate[|c|c|] \NC 0300 COMBINING GRAVE ACCENT \NC \textgrave \NC \NR \NC 0309 COMBINING HOOK ABOVE \NC \texthookabove \NC \NR \NC 0303 COMBINING TILDE\NC \texttilde \NC \NR \NC 0301 COMBINING ACUTE ACCENT \NC \textacute \NC \NR \NC 0323 COMBINING DOT BELOW\NC \textbottomdot \NC \NR \NC 037A GREEK YPOGEGRAMMENI\NC \unknownchar \NC \NR % prime? \NC 0384 GREEK TONOS\NC \greektonos \NC \NR \NC 0385 GREEK DIALYTIKA TONOS \NC \greekdialytikatonos \NC \NR \NC 2015 HORIZONTAL BAR \NC \texthorizontalbar \NC \NR \NC 20AB DONG SIGN \NC \textdong\NC \NR \NC 20AF DRACHMA SIGN \NC \unknownchar \NC \NR \NC 2116 NUMERO SIGN\NC \textnumero \NC \NR \stoptabulate \stoptext Mojca Miklavec wrote: Hans Hagen wrote: Mojca Miklavec wrote: Taco Hoekwater wrote: New features since 2005.12.18: * Support for the latin-9 regime (latin-1 + euro) There are some more (automatically generated) regime definitions at http://pub.mojca.org/tex/enco/contextbase/ (only from the glyph names that I was able to extract from the existing files, so it's only OK for some of the regimes mentioned there). If possible, I would like to ask for core support for windows-1250 (perhaps other users may find some other regimes useful as well). just send me the files you feel confident with (I'll send the good files soon.) Except Celtic, Thai, Arabic and Hebrew (although the letter names for Hebrew are almost completely defined) almost all the windows and ISO regimes are OK, just some glyphs are missing (which are, or at least were, missing in Unicode vectors as well). If anyone has suggestions for names for the following characters, 6 additional regimes can be fully supported: windows-1251 and iso-8859-5 2116 NUMERO SIGN windows-1253 0385 GREEK DIALYTIKA TONOS 2015 HORIZONTAL BAR 0384 GREEK TONOS windows-1258 0300 COMBINING GRAVE ACCENT 0309 COMBINING HOOK ABOVE 0303 COMBINING TILDE 0301 COMBINING ACUTE ACCENT 0323 COMBINING DOT BELOW 20AB DONG SIGN iso-8859-7 20AF DRACHMA SIGN 037A GREEK YPOGEGRAMMENI 2015 HORIZONTAL BAR 0384 GREEK TONOS 0385 GREEK DIALYTIKA TONOS iso-8859-10 2015 HORIZONTAL BAR Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
[NTG-context] towards some more consistency in regimes unicode support
the case in ConTeXt unless something has changed recently. There are many other letter wrongly named in Unicode (with cedilla), although they have a comma. I would suggest to name them \[gklnr]commaaccent and use \[gklnr]cedilla as a synonym (if needed at all for backward compatibility, otherwise it would be better to leave them out; there is no such letter with cedilla in unicode, if someone needs one, he can construct one trivially with \buildtextaccent) 7. there's a-kind-of-bug-but-not-really-one in enco-ans.tex. textcedilla maps to 184, which isn't defined in Antykwa for example (it's on place 24). It's more a bug in texnansi encoding, which has cedilla on two places, which is pretty stupid. But anyway: \definecharacter textcedilla 24 would solve some problems (and hopefully not introduce new ones). 8. most letters are named c with cedilla - ccedilla what about the names for open o, turned e, long s, turned r with hook? \openo or \oopen? \rturnedhook or \turnedrhook? 9. can latin letters and numbers be accessed somehow by name? 10. Adam prepared some dingbats support I think, this could be added here. 11. There's a showunicode pdf document on pragma-ade.com (at least I saw it once), but it's not listed on the overview.htm. 12. I don't know if anyone would ever need to switch from viscii regime to some other, but what would happen to the characters under 128 (some of them are redefined in viscii)? I'm affraid that there would remain Vietnamese leftovers in the lower part of the table. 13. If there are any other comments on the table and/or the script(s), please let me know. IV. With the help of the prepared names list I processed definitions for regimes (taken from Unicode webpage) for ISO-8859-* and cp125* (others should be trivial). They are only preliminary, some (Hebrew, Thai, Arabic) probably don't make any sense yet, but could the rest be added to ConTeXt after someone checks if everything is OK? (iso88595, cp1251, il1, il2, il9, windows and viscii regimes already exist and should be compared for differences) If possible in such a way that it wouldn't be necessary to include the regime definition file manually, but similarly as \usemodule[pre-polish] finds and processes the proper file, the \enableregime[xxx] should find the proper file and load it. (And for those who made it till here - sorry again for that gigantic mail.) Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
[NTG-context] Regimes to be supported; Comments?
Hello, Some time ago there was a discussion about extending support for different regimes in ConTeXt. The list of (to-be-)supported regimes probably depends strongly on the implementation (ruby+iconv?). I collected a preliminary list of candidate regimes and possible synonyms (some synonyms are listed there for backward compatibility and have to remain there), leaving out most of eastern encodings (not because they shouldn't be on the list, but because I'm completely ignorant about that). Hans suggested to post this to the mailing list first to get some useful comments and suggestions. # The following question should probably go in a separate thread, but it's a very similar thematic. In July 2006 Ljubljana will host people from around 85 coutries of the world. One of the very ambitious organizers is dreaming for already a couple of years to print the participant names (on honourable mentions for example, ...) in both latinic transcription and as they are written in original (under an assumption that the names are properly entered in a UTF-8 database). This is probably not possible to do for every single obscure language, but does it in general sound like: a) Good luck (I don't want to be on your place)! b) Take a good (commercial) program c) If you're ready to invest the rest of your time (forget about hobbies!), it's probably doable in LaTeX or ConTeXt until then č) Forget about TeX - it will be possible to solve this problem one day with unicode one of the new TeX engines. But until then, it's not worth the effort, because any effort you may invest will become obsolete in a couple of years. To be honest, even some people who will thanslate the materials into the native language, will probably do that with paper, pencil scanner. # Mojca And here the encodings: # ISO ISO-8859-1 Western ISO-8859-2 Central European ISO-8859-3 South European ISO-8859-4 Baltic ISO-8859-5 Cyrillic ISO-8859-6 Arabic ISO-8859-7 Greek ISO-8859-8 Hebrew Visual ISO-8859-8-I Hebrew (???) What is that? ISO-8859-9 Turkish ISO-8859-10 Nordic ISO-8859-11 Thai ISO-8859-13 Baltic ISO-8859-14 Celtic ISO-8859-15 Western ISO-8859-16 Romanian \defineregimesynonym[il*][iso-8859-*], *=1-16\12 \defineregimesynonym[latin*][iso-8859-*], *=1-16\12 \defineregimesynonym[cp819][iso-8859-1] % I'm not sure that anyone needs these: \defineregimesynonym[iso-ir-100][iso-8859-1] \defineregimesynonym[iso-ir-101][iso-8859-2] \defineregimesynonym[iso-ir-109][iso-8859-3] \defineregimesynonym[iso-ir-110][iso-8859-4] \defineregimesynonym[iso-ir-144][iso-8859-5] \defineregimesynonym[iso-ir-127][iso-8859-6] \defineregimesynonym[iso-ir-126][iso-8859-7] \defineregimesynonym[iso-ir-138][iso-8859-8] \defineregimesynonym[iso-ir-148][iso-8859-9] \defineregimesynonym[iso-ir-157][iso-8859-10] \defineregimesynonym[iso-ir-179][iso-8859-13] \defineregimesynonym[iso-ir-199][iso-8859-14] \defineregimesynonym[iso-ir-203][iso-8859-15] \defineregimesynonym[iso-ir-226][iso-8859-16] % backward compatibility \defineregimesynonym[iso88595][iso-8859-5] (recode also recognises arabic, greek, cyrillic, hebrew as an alias for those encodings: I don't if this is a good idea as there are other charset operating with the same language groups as well) # APPLE MacArabic MacCeltic MacCentralEuropean % CentEur, CentralEurope or CentralEuropean? or all of them? MacChineseSimplified MacChineseTraditional MacCroatian MacCyrillic MacDevanagari MacDingbats MacFarsi MacGaelic MacGreek MacGujarati MacGurmukhi MacHebrew MacIcelandic MacInuit MacJapanese MacKeyboard MacKorean MacRoman MacRomanian MacSymbol MacThai MacTurkish MacUkrainian \defineregimesynonym[MacCE][MacCentralEuropean] \defineregimesynonym[mac][MacRoman] \defineregimesynonym[maccyr][MacCyrillic] \defineregimesynonym[macukr][MacUkrainian] (I also need some help here: sometimes Mac encodings are defined using adjectives, sometimes using nouns, like Ukraine/Ukrainian. Should only one of them (which?) be used or both of them? On the unicode page, Mac encodings appear twice. The second time under Microsoft/Apple, containing MacCyrillic, MacGreek, MacIceland, MacLatin2, MacRoman, MacTurkish. I didn't really get the point for that.) # IBM % essentially the same as under Microsoft, with some minor changes (to be processed manually, if these are to be supported) # MICROSOFT EBCDIC % plenty of them are missing on the web cp037 cp500 cp875 cp1026 PC cp437 LatinUS cp737 Greek cp775 BaltRim cp850 Latin1 cp852 Latin2 cp855 Cyrillic cp857 Turkish cp860 Portuguese cp861 Icelandic cp862 Hebrew