Re: [NTG-context] Transliteration

2022-02-03 Thread Hans Hagen via ntg-context

On 2/3/2022 10:01 PM, Mojca Miklavec wrote:

On Thu, 3 Feb 2022 at 21:41, Hans Hagen wrote:



I have also merged the Serbian hyphenation patterns, so there is no need
to switch the language in order to have hyphenation in transliterated text.
That was possible because cyrillic and latin scripts use different code
points, and there are no conflicts in patterns.
So I suggest merging the patterns for Serbian cyrillic and latin.


I'd like to hear Arthur / Mojca on that  we can of course load them
both but if that is an upstream merge i'll wait for that


Yes, loading both patterns at once is definitely the correct approach.
That's what the rest of the TeX world already does (at least LuaTeX
and XeTeX; pdfTeX not of course), see
 
https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/loadhyph/loadhyph-sr-latn.tex

We have two sets of Cyrillic patterns (and several Latin ones as
well), so composing a single file was a bit of a (somewhat political)
challenge.
Now at least in theory the users are free to choose which of the two
sets of patterns they want.

I never checked what ConTeXt was doing with the Serbian patterns.
Personally I would suggest taking hyph-sh-cyrl.pat.txt and hyph-sh-latn.pat.txt.

we currently do this:

{ "sr",  "hyph-sr","serbian", false, { "hyph-sr-cyrl", 
"hyph-sr-latn" }, },


so you suggest to replace that by the "sh" variants

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Transliteration

2022-02-03 Thread Mojca Miklavec via ntg-context
On Thu, 3 Feb 2022 at 21:41, Hans Hagen wrote:
>
> > I have also merged the Serbian hyphenation patterns, so there is no need
> > to switch the language in order to have hyphenation in transliterated text.
> > That was possible because cyrillic and latin scripts use different code
> > points, and there are no conflicts in patterns.
> > So I suggest merging the patterns for Serbian cyrillic and latin.
>
> I'd like to hear Arthur / Mojca on that  we can of course load them
> both but if that is an upstream merge i'll wait for that

Yes, loading both patterns at once is definitely the correct approach.
That's what the rest of the TeX world already does (at least LuaTeX
and XeTeX; pdfTeX not of course), see

https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/loadhyph/loadhyph-sr-latn.tex

We have two sets of Cyrillic patterns (and several Latin ones as
well), so composing a single file was a bit of a (somewhat political)
challenge.
Now at least in theory the users are free to choose which of the two
sets of patterns they want.

I never checked what ConTeXt was doing with the Serbian patterns.
Personally I would suggest taking hyph-sh-cyrl.pat.txt and hyph-sh-latn.pat.txt.

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


[NTG-context] lmtx

2022-02-03 Thread Hans Hagen via ntg-context

Hi,

There haven't been updates and the reason is that we're in the middle of 
math and/or wrapping up on transliteration / indic (for which there is 
also a nice wiki page being made).


Concerning math, we're progressing with some rather flexible spacing, 
penalty, linebreak extensions but don't want to risk that when not used 
it gives different results (probably not all can be completely 
compatible but that is a side effect of getting rid of some hard coded 
assumptions in the engine). One objective is to make input a bit easier 
and predictable, esp wrt the often needed correction spacing (\, and 
friends).


Anyway, in addition to Mikaels request for input an while ago ... if 
there are 'tex math annoyances' that you think originate in the way tex 
expects input, and you'd like them to be taken into account, let us know.


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Transliteration

2022-02-03 Thread Hans Hagen via ntg-context

On 2/3/2022 8:15 PM, Ivan Pešić via ntg-context wrote:

Hello!
I've been working on a Serbian book and I had to transliterate it from 
cyrillic to latin.
There's been some nice improvement in transliteration, and I would like 
to propose a small change.
One of the peculiarities that current transliteration mechanisms (both 
internal one and the 3rd party module from Philipp Gesang)
don't process is that Љ, Њ and Џ are transliterated to Lj, Nj and Dž in 
normal words that start the sentence, or in names that normally start 
with a capital letter,
but in titles written in all capitals they should be transliterated to 
LJ, NJ and DŽ.
So, the quick solution was to update the current mapping vector and add 
another one (that is attached) that maps cyrillic capitals to LJ, NJ and DŽ

and set the correct 30 letters used in Serbian language.
It requires a bit more manual work to set the correct mapping for all 
capitals text, but it works.
I have also merged the Serbian hyphenation patterns, so there is no need 
to switch the language in order to have hyphenation in transliterated text.
That was possible because cyrillic and latin scripts use different code 
points, and there are no conflicts in patterns.

So I suggest merging the patterns for Serbian cyrillic and latin.


I'd like to hear Arthur / Mojca on that  we can of course load them 
both but if that is an upstream merge i'll wait for that


you can actually map multiple to multiple in the tranmsliteration tables

["foo"] = "oof"

and such and there is in the next version also an exception mechanism 
that permits clone a transliteration and add exceptions


There is another issue if one wants to use a dropcap and the rest of 
that first word, and several following words are to be typeset in small 
caps.
If that first letter is Љ (or other two letters that transliterate as 
digraphs), then the second letter of the digraph is not typeset in small 
caps because

it gets injected before the group that turns on small caps.
For example:

\placeinitial
Љ{\sc уди нису знали}

but this is quite a special case...
you can use \settransliteration{name} locally so as part of a style 
specification (there is also \resettransliteration)


the next upload has some more that Sreeram is currently documenting on 
the wiki


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


[NTG-context] Transliteration

2022-02-03 Thread Ivan Pešić via ntg-context

Hello!
I've been working on a Serbian book and I had to transliterate it from 
cyrillic to latin.
There's been some nice improvement in transliteration, and I would like 
to propose a small change.
One of the peculiarities that current transliteration mechanisms (both 
internal one and the 3rd party module from Philipp Gesang)
don't process is that Љ, Њ and Џ are transliterated to Lj, Nj and Dž in 
normal words that start the sentence, or in names that normally start 
with a capital letter,
but in titles written in all capitals they should be transliterated to 
LJ, NJ and DŽ.
So, the quick solution was to update the current mapping vector and add 
another one (that is attached) that maps cyrillic capitals to LJ, NJ and DŽ

and set the correct 30 letters used in Serbian language.
It requires a bit more manual work to set the correct mapping for all 
capitals text, but it works.
I have also merged the Serbian hyphenation patterns, so there is no need 
to switch the language in order to have hyphenation in transliterated text.
That was possible because cyrillic and latin scripts use different code 
points, and there are no conflicts in patterns.

So I suggest merging the patterns for Serbian cyrillic and latin.

There is another issue if one wants to use a dropcap and the rest of 
that first word, and several following words are to be typeset in small 
caps.
If that first letter is Љ (or other two letters that transliterate as 
digraphs), then the second letter of the digraph is not typeset in small 
caps because

it gets injected before the group that turns on small caps.
For example:

   \placeinitial
   Љ{\sc уди нису знали}

but this is quite a special case...

Regards,
Ivan
return {
  transliterations = {
["c2l"] = {
mapping = {
["А"] = "A",  ["а"] = "a",
["Б"] = "B",  ["б"] = "b",
["В"] = "V",  ["в"] = "v",
["Г"] = "G",  ["г"] = "g",
["Д"] = "D",  ["д"] = "d",
["Ђ"] = "Đ",  ["ђ"] = "đ",
["Е"] = "E",  ["е"] = "e",
["Ж"] = "Ž",  ["ж"] = "ž",
["З"] = "Z",  ["з"] = "z",
["И"] = "I",  ["и"] = "i",
["Ј"] = "J",  ["ј"] = "j",
["К"] = "K",  ["к"] = "k",
["Л"] = "L",  ["л"] = "l",
["Љ"] = "Lj",  ["љ"] = "lj",
["М"] = "M",  ["м"] = "m",
["Н"] = "N",  ["н"] = "n",
["Њ"] = "Nj",  ["њ"] = "nj",
["О"] = "O",  ["о"] = "o",
["П"] = "P",  ["п"] = "p",
["Р"] = "R",  ["р"] = "r",
["С"] = "S",  ["с"] = "s",
["Т"] = "T", ["т"] = "t",
["Ћ"] = "Ć",  ["ћ"] = "ć",
["У"] = "U",  ["у"] = "u",
["Ф"] = "F",  ["ф"] = "f",
["Х"] = "H", ["х"] = "h",
["Ц"] = "C",  ["ц"] = "c",
["Ч"] = "Č",  ["ч"] = "č",
["Џ"] = "Dž", ["џ"] = "dž",
["Ш"] = "Š", ["ш"] = "š",
}
},
["C2L"] = {
mapping = {
["А"] = "A",  ["а"] = "a",
["Б"] = "B",  ["б"] = "b",
["В"] = "V",  ["в"] = "v",
["Г"] = "G",  ["г"] = "g",
["Д"] = "D",  ["д"] = "d",
["Ђ"] = "Đ",  ["ђ"] = "đ",
["Е"] = "E",  ["е"] = "e",
["Ж"] = "Ž",  ["ж"] = "ž",
["З"] = "Z",  ["з"] = "z",
["И"] = "I",  ["и"] = "i",
["Ј"] = "J",  ["ј"] = "j",
["К"] = "K",  ["к"] = "k",
["Л"] = "L",  ["л"] = "l",
["Љ"] = "LJ",  ["љ"] = "lj",
["М"] = "M",  ["м"] = "m",
["Н"] = "N",  ["н"] = "n",
["Њ"] = "NJ",  ["њ"] = "nj",
["О"] = "O",  ["о"] = "o",
["П"] = "P",  ["п"] = "p",
["Р"] = "R",  ["р"] = "r",
["С"] = "S",  ["с"] = "s",
["Т"] = "T", ["т"] = "t",
["Ћ"] = "Ć",  ["ћ"] = "ć",
["У"] = "U",  ["у"] = "u",
["Ф"] = "F",  ["ф"] = "f",
["Х"] = "H", ["х"] = "h",
["Ц"] = "C",  ["ц"] = "c",
["Ч"] = "Č",  ["ч"] = "č",
["Џ"] = "DŽ", ["џ"] = "dž",
["Ш"] = "Š", ["ш"] = "š",
}
 }
  }
}
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___