Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2021-08-18 Thread Hans Hagen via ntg-context

On 8/17/2021 9:46 PM, Joey McCollum wrote:
Thankfully, it looks like this was just a problem with my implementation 
of the OpenType feature and not with ConTeXt's handling of it! (I 
worried that it might be ConTeXt when I saw that XeLaTeX was handing the 
feature correctly.) Hans graciously helped me identify the problem, and 
everything looks good now!
Just for the record: one can best try to make a font as robust as 
possible and not rely on side effects (ambiguous cases). When Idris and 
I tested some shapers we found that there can be inconsistent results 
(fwiw, in a rather complex font context agreed more often with uniscribe 
than xetex, but in the end on ehas to make the font okay for all i guess).


When we started with opentype (luatex showed up in 2005) we took 
uniscribe as reference so that is our benchmark. And lack of specs made 
us figure out things stepwise. Now, if something works in one shaper and 
not in another it can of course be due to bugs but it can also be that 
the spec is simply fuzzy and choices have been made. There is then the 
danger that eventually bugs become features (I assume the amount of 
leverage matters here, and tex has zero) which then settles it (kind of) 
but that doesn't man that one should gamble on it.


The same is true for fontnames: don't rely too much on the heuristics 
hard coded in programs (e.g. fontforge has some for font names, 
properties, glyph names, and although that is nice for recovery, it also 
makes other usage hard because fighting fuzzy heuristics is hard once 
information is lost).


Btw, a side effect of your 'issue' is that I found a way to save some 
memory for some fonts (for now only in lmtx) at the cost of hopefully 
little extra runtime.


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2021-08-17 Thread Joey McCollum via ntg-context
Thankfully, it looks like this was just a problem with my implementation of
the OpenType feature and not with ConTeXt's handling of it! (I worried that
it might be ConTeXt when I saw that XeLaTeX was handing the feature
correctly.) Hans graciously helped me identify the problem, and everything
looks good now!

Joey

On Tue, Aug 17, 2021 at 8:56 AM Joey McCollum 
wrote:

> Shouldn't dlig automatically be enabled under the "hebrew" feature set? In
> font-pre.mkiv, hebrew inherits from semitic-complete, which sets dlig=yes.
>
> Still, if I explicitly add dlig, as in the following example, things
> change, but they still aren't right:
>
> ```
>
> \starttypescriptcollection[keteryg]
>
> \starttypescript[serif][keteryg]
>
> \definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew]
> % all the necessary Hebrew features, including dlig
>
> \stoptypescript
>
>
> \starttypescript[keteryg]
>
> \definetypeface[keteryg][rm][serif][keteryg][default]
>
> \stoptypescript
>
> \stoptypescriptcollection
>
>
> %Set up the main font:
>
> \setupbodyfont[keteryg]
>
> %Set up right-to-left alignment:
>
> \setupalign[r2l]
>
> %Explicitly add dlig (in case it wasn't there already):
>
> \definefontfeature[plus-dlig][dlig=yes]
>
>
> \starttext
>
> \addff{plus-dlig}
>
> שֹׂבַע
>
> עָשׂוֹר
>
> קֹשֶׁט
>
> שֹׁשַׁנִּים
>
> עָשׂוֹר
>
> מֹשֶׁה
>
> שַׁלֹשׁ
>
> \stoptext ``` In examples 1, 3, 4, and 6, the *holam *of the preceding
> letter (which should have been stripped in the contextual substitution)
> just seems to have been moved farther up. In fact, the output looks like it
> would look if I turned off the reordercombining feature. (And indeed, if I
> manually reorder the glyphs to the Hebrew Layout Intelligence order, then
> the results look like they did when I just used the "hebrew" feature.)
>
>
> I may have forgotten to attach the font file I was using for this test. If
> that is the case, it is available at
> https://github.com/jjmccollum/Keter-YG.
>
>
> Joey
>
> On Tue, Aug 17, 2021 at 5:19 AM Hans Hagen  wrote:
>
>> On 8/17/2021 2:07 AM, Joey McCollum wrote:
>>
>> > Sorry to bring this up after over a year, but I just noticed something
>> > that doesn't seem right. I implemented some contextual substitutions in
>> > my own fork of the Keter YG Hebrew font (.ttf file attached) under the
>> > "dlig" feature that should do the following two things:
>> but you don't enable dlig
>>
>> -
>>Hans Hagen | PRAGMA ADE
>>Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
>> -
>>
>
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2021-08-17 Thread Joey McCollum via ntg-context
Shouldn't dlig automatically be enabled under the "hebrew" feature set? In
font-pre.mkiv, hebrew inherits from semitic-complete, which sets dlig=yes.

Still, if I explicitly add dlig, as in the following example, things
change, but they still aren't right:

```

\starttypescriptcollection[keteryg]

\starttypescript[serif][keteryg]

\definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew]
% all the necessary Hebrew features, including dlig

\stoptypescript


\starttypescript[keteryg]

\definetypeface[keteryg][rm][serif][keteryg][default]

\stoptypescript

\stoptypescriptcollection


%Set up the main font:

\setupbodyfont[keteryg]

%Set up right-to-left alignment:

\setupalign[r2l]

%Explicitly add dlig (in case it wasn't there already):

\definefontfeature[plus-dlig][dlig=yes]


\starttext

\addff{plus-dlig}

שֹׂבַע

עָשׂוֹר

קֹשֶׁט

שֹׁשַׁנִּים

עָשׂוֹר

מֹשֶׁה

שַׁלֹשׁ

\stoptext ``` In examples 1, 3, 4, and 6, the *holam *of the preceding
letter (which should have been stripped in the contextual substitution)
just seems to have been moved farther up. In fact, the output looks like it
would look if I turned off the reordercombining feature. (And indeed, if I
manually reorder the glyphs to the Hebrew Layout Intelligence order, then
the results look like they did when I just used the "hebrew" feature.)


I may have forgotten to attach the font file I was using for this test. If
that is the case, it is available at https://github.com/jjmccollum/Keter-YG.


Joey

On Tue, Aug 17, 2021 at 5:19 AM Hans Hagen  wrote:

> On 8/17/2021 2:07 AM, Joey McCollum wrote:
>
> > Sorry to bring this up after over a year, but I just noticed something
> > that doesn't seem right. I implemented some contextual substitutions in
> > my own fork of the Keter YG Hebrew font (.ttf file attached) under the
> > "dlig" feature that should do the following two things:
> but you don't enable dlig
>
> -
>Hans Hagen | PRAGMA ADE
>Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -
>
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2021-08-17 Thread Hans Hagen via ntg-context

On 8/17/2021 2:07 AM, Joey McCollum wrote:

Sorry to bring this up after over a year, but I just noticed something 
that doesn't seem right. I implemented some contextual substitutions in 
my own fork of the Keter YG Hebrew font (.ttf file attached) under the 
"dlig" feature that should do the following two things:

but you don't enable dlig

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2021-08-16 Thread Joey McCollum via ntg-context
Hans,

Sorry to bring this up after over a year, but I just noticed something that
doesn't seem right. I implemented some contextual substitutions in my own
fork of the Keter YG Hebrew font (.ttf file attached) under the "dlig"
feature that should do the following two things:

   1. If a *shin *with a *sin *dot (שׂ) is pointed with a *holam *(the
   vowel point placed high and on the left), then the *shin*, *sin *dot,
   and *holam *are combined into a single ligature that depicts the *sin *dot
   and *holam *merged into a single point.
   2. If a *shin *with a *shin *dot (שׁ) follows another letter pointed
   with a *holam *(except for *vav*, which must be pointed with a *holam
   haser*), then the shin and shin dot are replaced with a ligature that
   moves the *shin* dot a bit to the right (so that it appears to be merged
   with the preceding *holam*), and the combination of the preceding letter
   and the actual holam is changed to just the preceding letter (thus
   effectively stripping the old *holam*).

I've tested both of these features in FontForge, and they work as expected
there. Likewise, if I test them in the following XeLaTeX script, XeLaTeX
handles both rules correctly:

```
\documentclass{article}
%Set fonts and font features:
\usepackage{fontspec}
\setmainfont[Path=../fonts/KeterYG/, UprightFont = *-Medium, Script=Hebrew,
Ligatures=Discretionary]{KeterYG} % I'm using a local copy of the attached
font
\begin{document}
שֹׂבַע

עָשׂוֹר

קֹשֶׁט

שֹׁשַׁנִּים

עָשׂוֹר

מֹשֶׁה

שַׁלֹשׁ
\end{document}
```

But in ConTeXt, only rule (1) above works as expected. Here is a minimal
(non-)working example:

```

\starttypescriptcollection[keteryg]

\starttypescript[serif][keteryg]

\definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew]
% use a local copy of the attached font, with all the necessary Hebrew
features (this includes dlig by default)

\stoptypescript


\starttypescript[keteryg]

\definetypeface[keteryg][rm][serif][keteryg][default]

\stoptypescript

\stoptypescriptcollection


%Set up the main font:

\setupbodyfont[keteryg]

%Set up right-to-left alignment:

\setupalign[r2l]

\starttext

שֹׂבַע

עָשׂוֹר

קֹשֶׁט

שֹׁשַׁנִּים

עָשׂוֹר

מֹשֶׁה

שַׁלֹשׁ

\stoptext
```

In examples 3, 4, 6, and 7, the *holam *dot still appears before the *shin*
-with-merged-*shin*-dot-and-*holam *ligature, when it should be absent. (I
realize that it may be difficult to tell; in the last two examples, the
presence of two dots is easier to make out.)

Do you have any idea why this might be happening in ConTeXt? Does the glyph
reordering in font-imp-combining.lua take place before any OpenType
features in the font are applied?

Thanks again!

Joey

On Thu, Apr 30, 2020 at 4:17 PM Joey McCollum 
wrote:

> Okay! I have not figured out how to add a new page to the wiki, but I was
> able to add a section to the end of the "Arabic and Hebrew" page (
> https://www.contextgarden.net/Arabic_and_Hebrew) discussing the issue,
> providing a test, and briefly describing the fix.
>
> Joey
>
> On Thu, Apr 30, 2020 at 11:14 AM Hans Hagen  wrote:
>
>> On 4/30/2020 4:28 PM, Joey McCollum wrote:
>> > Thanks so much, Hans! I should be able to add a wiki page summarizing
>> > the tests before the end of the week.
>> >
>> > For reference purposes, do you know which version of ConTeXt has (or
>> > will have) this update included?
>> todays upload
>>
>>
>> -
>>Hans Hagen | PRAGMA ADE
>>Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
>> -
>>
>
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2020-04-30 Thread Joey McCollum
Okay! I have not figured out how to add a new page to the wiki, but I was
able to add a section to the end of the "Arabic and Hebrew" page (
https://www.contextgarden.net/Arabic_and_Hebrew) discussing the issue,
providing a test, and briefly describing the fix.

Joey

On Thu, Apr 30, 2020 at 11:14 AM Hans Hagen  wrote:

> On 4/30/2020 4:28 PM, Joey McCollum wrote:
> > Thanks so much, Hans! I should be able to add a wiki page summarizing
> > the tests before the end of the week.
> >
> > For reference purposes, do you know which version of ConTeXt has (or
> > will have) this update included?
> todays upload
>
>
> -
>Hans Hagen | PRAGMA ADE
>Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -
>
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2020-04-30 Thread Hans Hagen

On 4/30/2020 4:28 PM, Joey McCollum wrote:
Thanks so much, Hans! I should be able to add a wiki page summarizing 
the tests before the end of the week.


For reference purposes, do you know which version of ConTeXt has (or 
will have) this update included?

todays upload


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2020-04-30 Thread Joey McCollum
Thanks so much, Hans! I should be able to add a wiki page summarizing the
tests before the end of the week.

For reference purposes, do you know which version of ConTeXt has (or will
have) this update included?

Joey

On Thu, Apr 30, 2020 at 5:26 AM Hans Hagen  wrote:

> On 4/28/2020 1:59 PM, Joey McCollum wrote:
>
>  > ...
>
> > My question is, can ConTeXt with LuaTeX handle the same situation
> > correctly? In the following minimal example, ConTeXt typesets pointed
> > Hebrew correctly when the characters are in the typographically
> > recommended order, but not when they are in Unicode canonical order:
> We (Joey and I) figured out how to best deal with this. As a result the
> predefined hebrew feature now will do the right thing for fonts that
> assume some specific ordering. So, this should work okay:
>
> \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=hebrew]
>
> in the most recent upload.
>
> Maybe there should be a wiki page that summarizes tests with hebrew
> fonts (but I leave that up to Joey).
>
> Hans
>
>
>
> -
>Hans Hagen | PRAGMA ADE
>Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -
>
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2020-04-30 Thread Hans Hagen

On 4/28/2020 1:59 PM, Joey McCollum wrote:

> ...

My question is, can ConTeXt with LuaTeX handle the same situation 
correctly? In the following minimal example, ConTeXt typesets pointed 
Hebrew correctly when the characters are in the typographically 
recommended order, but not when they are in Unicode canonical order:
We (Joey and I) figured out how to best deal with this. As a result the 
predefined hebrew feature now will do the right thing for fonts that 
assume some specific ordering. So, this should work okay:


\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=hebrew]

in the most recent upload.

Maybe there should be a wiki page that summarizes tests with hebrew 
fonts (but I leave that up to Joey).


Hans



-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Unicode normalization and Hebrew in ConTeXt

2020-04-28 Thread Hans Hagen

On 4/28/2020 1:59 PM, Joey McCollum wrote:

\definefontfeature[f:pointedhebrew][default][
     ccmp=yes,
     mark=yes,
     script=hebr
]
\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
%Set the body font:
\setupbodyfont[hebrew]
%Set up right-to-left alignment:
\setupalign[r2l]
\starttext
     %Characters after normalization, in Unicode canonical order (bet + 
segol + dagesh + final nun):

     בֶּן

     %A word with characters in typographically recommended order (bet + 
dagesh + segol + final nun):

     בֶּן
\stoptext


\startluacode
fonts.handlers.otf.addfeature {
name= "normalizehebrew",
type= "chainsubstitution",
prepend = 1,
lookups = {
{
type = "multiple",
data = {
[0x5B6] = { 0x5BC, 0x5B6 },
},
},
},
data = {
rules = {
{
current = { { 0x5B6 }, { 0x5BC } },
lookups = { 1, 0 },
},
},
},
}
\stopluacode

\definefontfeature
  [f:pointedhebrew]
  [hebrew]
  [normalizehebrew=yes]

\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]

\setupbodyfont[hebrew]

\setupalign[r2l]

\starttext
בֶּן \quad בֶּן \par
\stoptext

How many such reorderings are there? (I saw some document about that 
font and it sounds like a bit messy wrt all these input variants.)


(there are several mechanisms in context to deal with such issues, it's 
all about getting specs from users i.e. tex is all about control so in 
principle it should be doable)


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


[NTG-context] Unicode normalization and Hebrew in ConTeXt

2020-04-28 Thread Joey McCollum
I am typesetting a document in Hebrew that includes pointing (e.g., vowels,
shin and sin dots, dagesh, etc.) using ConTeXt. The Hebrew text that I want
to typeset has been normalized into Unicode's NFC canonical form. It is
well-known that the Unicode canonical ordering of Hebrew points conflicts
with the recommended mark ordering of specific points based on their
functions (see https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf
for more on this topic). Thankfully, many typesetting engines automatically
reorder the points to ensure that they are combined according to the
specifications of many fonts. I'm pretty sure that XeLaTeX is one of these,
as it typesets Hebrew letters with multiple points correctly even when the
Hebrew text is in NFC form.

My question is, can ConTeXt with LuaTeX handle the same situation
correctly? In the following minimal example, ConTeXt typesets pointed
Hebrew correctly when the characters are in the typographically recommended
order, but not when they are in Unicode canonical order:

```
%Setup Hebrew text font:
\definefontfeature[f:pointedhebrew][default][
ccmp=yes,
mark=yes,
script=hebr
]
\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
%Set the body font:
\setupbodyfont[hebrew]
%Set up right-to-left alignment:
\setupalign[r2l]
\starttext
%Characters after normalization, in Unicode canonical order (bet +
segol + dagesh + final nun):
בֶּן

%A word with characters in typographically recommended order (bet +
dagesh + segol + final nun):
בֶּן
\stoptext
```

I typeset this using ConTeXt version 2020.03.10, as released with TeXLive
2020. I got the SBL Hebrew font from
https://www.sbl-site.org/educational/BiblicalFonts_SBLHebrew.aspx.
According to the font's user manual (see the link above the MWE), the font
should be able to combine the marks to form the correct glyph regardless of
their order after the consonant, but that doesn't seem to be the case here.
I also tried using the predefined "hebrew" featureset, but that did not
change anything.

Is there some other OpenType feature or featureset I need to enable to fix
this, or is there some module or option I can include to get ConTeXt to
typeset Unicode-normalized Hebrew as if it were ordered in the recommended
way, like XeLaTeX does? I see that the uninormalize module is mentioned in
the thread "XeLaTeX, LuaLaTeX, fontspec, unicode and normalization" on TeX
Stack Exchange (
https://tex.stackexchange.com/questions/229044/xelatex-lualatex-fontspec-unicode-and-normalization);
can that be used with ConTeXt?

Thank you,

Joey
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___