[NTG-context] Re: Toggling the symbol for the zero-width joiner and related Unicode control characters

2023-09-22 Thread Hans Hagen via ntg-context

On 9/22/2023 3:51 PM, Hamid,Idris wrote:


-- Original Message --
 From "Hans Hagen" mailto:j.ha...@xs4all.nl>>
To "ntg-context@ntg.nl" 
mailto:ntg-context@ntg.nl>>
Date 9/22/2023 7:15:25 AM
Subject [NTG-context] Re: Toggling the symbol for the zero-width joiner and 
related Unicode control characters

** Caution: EXTERNAL Sender **

On 9/22/2023 2:39 PM, Hamid,Idris wrote:

b. we want all Unicode control symbols to be suppressed in final pdf output 
(for, e.g., printing).

they basically are unless some font features keeps them around which is
out of our control

irr it was you who wanted them to be wiped decades ago as some fonts
visualized them by default

Yes, that's exactly the point: Somewhere along the course of history, it became 
standard for Arabic-script fonts (and other cursive-script fonts as well) to 
include symbols for the control characters.

In typo-rep there is also

%D \starttyping
%D \definefontfeature[default][default][mode=node,formatters=strip]
%D \stoptyping

You included some notes about Khaled, so I guess he faced the same issue. His 
Amiri font displays the symbols by default, as do other Arabic fonts.

(It seems he never considered making it an opentype feature in the font itself, 
but since his focus is/was XeTeX/HB (HB is rather rigid and dictatorial) I 
guess that's not surprising.)


I admit that I don't follow what happens with xetex (they changed the 
rendere at some point indeed) not HB (I only notice that it gets updates 
frequently in the tex live repository which makes me wonder how one 
retains compatility unless one freezes). I actually kept the lib binding 
code that can use it around for your font testing (we wanted to see what 
uniscribe does), not sure if it still works.


Anyway, we're entering the bug cq. side effect becomes feature area 
here; just like yesterdays perfect bidi algorithm is todays less pefect 
one replaced by ...



But therein lies the problem: ConTeXt shows the rendering by default, and we 
need to turn it off. Since most non-Latin typography targets Uniscribe 
applications which allows for toggling, the font developers (commercial or 
free) don't have to concern themselves with this issue.


if context shows it then it is not a feature but hard coded shapes which 
is weird; how does one know what to 'remove' or not? And in what stage? 
If they are zero width it is simple to ignore them in the backend, if 
they have dimensions (w/h/d) then they contributed and wiping is tricky



Since Word rules the world, most font designers target it. Since Word provides 
for toggling the symbols -- needed for editing purposes -- there was no need 
for Arabic-script font designers to worry about the symbols showing up where 
they are not wanted.

(I suppose that InDesign behaves the same way.)


I don't know ... irr these dtp programs are more like "if you want this 
feature applied select a range of characters and apply it"



That's what was meant when I spoke of the continued effects of the WYSIWYG 
curse: It saved font designers from having to think much about this issue.


In some way it's also flaws in the open type approach. Basically that 
happens when application stuff becomes a standard and one forgets that 
it was (is) application driven. (And you haven't seen variable fonts and 
color fonts yet ... no pretty standards either.)



Not really -) This brings us to the point of consistency: For Arabic-script fonts, hard 
symbolic rendering of the Unicode control characters is the rule, not the exception. So 
not "an inconsistent mess" -- at least not as far as Arabic-script typography 
is concerned.


Funny rules ... but I'm not going top enable 'wipe' by default: after 
all, one gets what one deserves, nto what one likes (which can differ 
per day). But you can enable the wiping. We can of course ignore in the 
backend when zero width but then how to explain that they contributed to 
the ht/dp (unless we wipe these dimensions) ... all slow-downers



so you want to see soem zwj sumbol in a rendered text?

Only in verbatim/\type'd text where it is appropriate, even necessary. Thanks 
to Word/WYSIWYG, the rule is de facto, but it is not de jure -)

Ideally, Scintilla (Scite, Notepad++, etc.) should do the same, or provide a 
toggle, as MS Notepad does.

(Tangent: In terms of Unicode functionality, MS Notepad is still unrivalled, 
even in 2023!)

We agree that for final printed output it is not appropriate (except perhaps in 
a paper that discusses Unicode, fonts, etc., in which case it can be rendered 
using the figures or symbols mechanism -- or toggled as needed.)

so what is it now:

- for verbatim you can use almfixed and they show up (when they have a 
glyph)
- for other fonts if they have them they show up (unless gone in the 
process ot rendering)

- but you can wipe them optionally

not sure what more we need

Hans

-

[NTG-context] Re: Toggling the symbol for the zero-width joiner and related Unicode control characters

2023-09-22 Thread Hamid,Idris

-- Original Message --
From "Hans Hagen" mailto:j.ha...@xs4all.nl>>
To "ntg-context@ntg.nl" 
mailto:ntg-context@ntg.nl>>
Date 9/22/2023 7:15:25 AM
Subject [NTG-context] Re: Toggling the symbol for the zero-width joiner and 
related Unicode control characters

** Caution: EXTERNAL Sender **

On 9/22/2023 2:39 PM, Hamid,Idris wrote:

b. we want all Unicode control symbols to be suppressed in final pdf output 
(for, e.g., printing).

they basically are unless some font features keeps them around which is
out of our control

irr it was you who wanted them to be wiped decades ago as some fonts
visualized them by default

Yes, that's exactly the point: Somewhere along the course of history, it became 
standard for Arabic-script fonts (and other cursive-script fonts as well) to 
include symbols for the control characters.

In typo-rep there is also

%D \starttyping
%D \definefontfeature[default][default][mode=node,formatters=strip]
%D \stoptyping

You included some notes about Khaled, so I guess he faced the same issue. His 
Amiri font displays the symbols by default, as do other Arabic fonts.

(It seems he never considered making it an opentype feature in the font itself, 
but since his focus is/was XeTeX/HB (HB is rather rigid and dictatorial) I 
guess that's not surprising.)



But therein lies the problem: ConTeXt shows the rendering by default, and we 
need to turn it off. Since most non-Latin typography targets Uniscribe 
applications which allows for toggling, the font developers (commercial or 
free) don't have to concern themselves with this issue.

?

Since Word rules the world, most font designers target it. Since Word provides 
for toggling the symbols -- needed for editing purposes -- there was no need 
for Arabic-script font designers to worry about the symbols showing up where 
they are not wanted.

(I suppose that InDesign behaves the same way.)

That's what was meant when I spoke of the continued effects of the WYSIWYG 
curse: It saved font designers from having to think much about this issue.



Not really -) This brings us to the point of consistency: For Arabic-script 
fonts, hard symbolic rendering of the Unicode control characters is the rule, 
not the exception. So not "an inconsistent mess" -- at least not as far as 
Arabic-script typography is concerned.

so you want to see soem zwj sumbol in a rendered text?

Only in verbatim/\type'd text where it is appropriate, even necessary. Thanks 
to Word/WYSIWYG, the rule is de facto, but it is not de jure -)

Ideally, Scintilla (Scite, Notepad++, etc.) should do the same, or provide a 
toggle, as MS Notepad does.

(Tangent: In terms of Unicode functionality, MS Notepad is still unrivalled, 
even in 2023!)

We agree that for final printed output it is not appropriate (except perhaps in 
a paper that discusses Unicode, fonts, etc., in which case it can be rendered 
using the figures or symbols mechanism -- or toggled as needed.)

I hope the above made sense.

Best wishes
--
Idris Samawi Hamid, Professor
Department of Philosophy
Colorado State University
Fort Collins, CO 80523
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : https://contextgarden.net
___

[NTG-context] Re: Toggling the symbol for the zero-width joiner and related Unicode control characters

2023-09-22 Thread Hamid,Idris

-- Original Message --
From "Hans Hagen" mailto:j.ha...@xs4all.nl>>
To "ntg-context@ntg.nl" 
mailto:ntg-context@ntg.nl>>
Date 9/22/2023 7:03:34 AM
Subject [NTG-context] Re: Toggling the symbol for the zero-width joiner and 
related Unicode control characters


Hi,

I found it ...

\startbuffer
\definedfont[almfixed*default]hello w\zwnj o\zwj r\zwnj l\zwj d
\stopbuffer

\getbuffer

\start
\setcharacterstripping[1]
\getbuffer
\stop

so now, being its only user, you have to wikify it ...

Ah, there it is! Many thanks, will wikify it.
Best wishes
Idris

--
Idris Samawi Hamid, Professor
Department of Philosophy
Colorado State University
Fort Collins, CO 80523
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : https://contextgarden.net
___

[NTG-context] Re: Toggling the symbol for the zero-width joiner and related Unicode control characters

2023-09-22 Thread Hans Hagen

On 9/22/2023 2:39 PM, Hamid,Idris wrote:


b. we want all Unicode control symbols to be suppressed in final pdf output 
(for, e.g., printing).

they basically are unless some font features keeps them around which is
out of our control


irr it was you who wanted them to be wiped decades ago as some fonts 
visualized them by default



But therein lies the problem: ConTeXt shows the rendering by default, and we 
need to turn it off. Since most non-Latin typography targets Uniscribe 
applications which allows for toggling, the font developers (commercial or 
free) don't have to concern themselves with this issue.


?


Not really -) This brings us to the point of consistency: For Arabic-script fonts, hard 
symbolic rendering of the Unicode control characters is the rule, not the exception. So 
not "an inconsistent mess" -- at least not as far as Arabic-script typography 
is concerned.


so you want to see soem zwj sumbol in a rendered text?
 Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : https://contextgarden.net
___


[NTG-context] Re: Toggling the symbol for the zero-width joiner and related Unicode control characters

2023-09-22 Thread Hans Hagen

Hi,

I found it ...

\startbuffer
\definedfont[almfixed*default]hello w\zwnj o\zwj r\zwnj l\zwj d
\stopbuffer

\getbuffer

\start
\setcharacterstripping[1]
\getbuffer
\stop

so now, being its only user, you have to wikify it ...

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : https://contextgarden.net
___


[NTG-context] Re: Toggling the symbol for the zero-width joiner and related Unicode control characters

2023-09-22 Thread Hamid,Idris


-- Original Message --
From "Hans Hagen via ntg-context" 
mailto:ntg-context@ntg.nl>>
To "ntg-context@ntg.nl" 
mailto:ntg-context@ntg.nl>>
Cc "Hans Hagen" mailto:j.ha...@freedom.nl>>
Date 9/22/2023 3:53:03 AM
Subject [NTG-context] Re: Toggling the symbol for the zero-width joiner and 
related Unicode control characters
1. Can this approach be generalized to get what we want, viz., a way to toggle 
the symbols?

given the inconsistency in what is or is not in a font the only way out
is to have our own visualization (consistent across fonts) and even then
it would add some mess because we're talking of a mix of characters that
can have gone (as part of rendering) or are not characters at all but
spacing

so, in that case only 'verbatim' is a candidate for visualization, not
so much typeset text

Hm, ok. Since almfixed is based on Knuth's mono, perhaps its visuals of the 
control characters can be extracted and used as fallback symbols.

Yes: For typeset text/printing visualization is generally unnecessary (the 
point of this thread).

2. \enabletrackers[typesetters.nbsp] gives a colored box, which is at least 
something.. But how can we get the NBSP symbol that's alerady in the font?

it's gone by that time ... the line break mmechanism uses glue, not
characters

Ok

3. Ideally:
a. we want all Unicode control symbols to show up in verbatim or in \typebuffer 
(as in a text editor);

only there (with some non interfering rendering i guess) and even then
it's probably an additonal pass over the node list

Ok, that would be good.

b. we want all Unicode control symbols to be suppressed in final pdf output 
(for, e.g., printing).

they basically are unless some font features keeps them around which is
out of our control

If the symbols are in the font, then they are not suppressed. See below.

But some fonts meant for printing have symbols for Unicode control chars -- 
that poses a challenge.

so an inconsistent mess not worth wasting time on (as this is hobbyism
only fun can be a motivational factir)

But there is a certain consistency -- see below.

And some fonts meant for verbatim/editing do not have symbols for the control 
chars -- that also poses a challenge. AlmFixed, of course, has them.

Most minimally decent Arabic fonts have symbols for the Unicode control chars 
as default, including Scheherazade, Amiri, Uthmanic, and Noto Naskh Arabic -- 
all free fonts.

Industry workhorses like Linotype Lotus (Arabic) also have them.

i'm not interested in those .. can't afford them for playing around
purposes .. we only look into commercial fonts if we get a dozen
unresticted copies for context developers

Except for Linotype Lotus, each of the Arabic-script fonts mentioned above is 
free, not commercial -)

(There is also a free version of Lotus -- it also has the symbolic rendering of 
the contol chars.)

Uniscribe applications like Notepad/Word allow for toggling in a WYSIWYG 
context -- can't speak for HarfBuzz -- so there is no harm in having explicit 
symbols in the font.

sure, as long as there is no rendering ... they show the input

But therein lies the problem: ConTeXt shows the rendering by default, and we 
need to turn it off. Since most non-Latin typography targets Uniscribe 
applications which allows for toggling, the font developers (commercial or 
free) don't have to concern themselves with this issue.

Yet another curse of the WYSIWYG paradigm, which mixes form and content -)

The upshot is that, for non-Latin scripts, some toggling capability in ConTeXt 
is important to have -- even inescapable for Arabic-script piblishing.

a bit subjectiev arguing -)

Not really -) This brings us to the point of consistency: For Arabic-script 
fonts, hard symbolic rendering of the Unicode control characters is the rule, 
not the exception. So not "an inconsistent mess" -- at least not as far as 
Arabic-script typography is concerned.

(Yes, for the upcoming Husayni I can add a font feature that does the trick, 
but that will be an exception to the rule.)

Perhaps others who use Arabic-script or Indic, etc., can chime in.. Am hopeful 
that we can figure something out!
sore, but not with 'instant priority' (unless it is some project)

My immediate project (no Husayni) is a book that features English translation 
of an Arabic text (hence the interest in the recent streams thread). Using some 
Unicode control characters will be unavoidable to get the rendering effects 
correct, but the symbols will need to be suppressed.

Am thinking/hoping that a ConTeXt-specific font feature can do the trick. Since 
there appears to be consistency across Arabic fonts in this matter it should 
not be messy at all, simply a fallback that sends the symbols to some no-man's 
land.

(A thought: Some of the code you kindly provided for transliteration might be 
reusable as well.. But a general solution for all ConTeXt users would be ideal.)

In any case, many thanks for your help in thinking this 

[NTG-context] Re: Toggling the symbol for the zero-width joiner and related Unicode control characters

2023-09-22 Thread Hans Hagen via ntg-context

On 9/22/2023 6:16 AM, Hamid,Idris wrote:



-- Original Message --
 From "Hans Hagen" mailto:j.ha...@xs4all.nl>>
To "Hamid,Idris" mailto:idris.ha...@colostate.edu>>; "mailing list 
for ConTeXt users" mailto:ntg-context@ntg.nl>>
Date 9/21/2023 3:29:22 PM
Subject Re: [NTG-context] Re: Toggling the symbol for the zero-width joiner and 
related Unicode control characters
   Many thanks, Hans. The method appears to work only for nbsp, not zwj etc. 
Here is the updated MWE:

===
\startTEXpage[offset=1em]
\disabletrackers[typesetters.directions]
\disabletrackers[typesetters.zwj]
\disabletrackers[typesetters.zwnj]
\disabletrackers[typesetters.nbsp]
\definedfont[almfixed at 14pt]
ZWJ: ‌
ZWNJ: ‍
NBSP:
\stopTEXpage
===

See attached, please advise.
joiners are part of replacement etc and can come and go ... they are
characters (we could visualize them but one never knows for sure if one
sees them)

nbsp are spaces and become glue that we can trace reliable in the node list

Many thanks. Ok, here is another MWE featuring a workaround using fallbacks:

==
\definefontfallback[nosymbols] [file:lmmono10-regular] [200C,200D] [force=yes]
\starttypescript [serif] [alm] [name]
 \definefontsynonym [Serif] [ArabicLatinSerif]
\stoptypescript
\starttypescript [mono] [alm] [name]
 \definefontsynonym [Mono]  [ArabicLatinMono]
\stoptypescript
\starttypescript [serif] [alm]
 \definefontsynonym [ArabicLatinSerif] [file:almfixed] % 
[fallbacks=nosymbols]
\stoptypescript
\starttypescript [mono] [alm]
 \definefontsynonym [ArabicLatinMono] [file:almfixed] [fallbacks=nosymbols]
\stoptypescript
\starttypescript [almfixed-nosymbols]
\definetypeface [\typescriptone] [rm] [serif] [alm] [default]
\definetypeface [\typescriptone] [tt] [mono] [alm] [default]
\stoptypescript
\usetypescript[almfixed-nosymbols]
\setupbodyfont[almfixed-nosymbols,12pt]
\startTEXpage[offset=1em]
\rm
ZWJ: ‌
ZWNJ: ‍
NBSP:
\tt
ZWJ: ‌
ZWNJ: ‍
NBSP:
\stopTEXpage
==

Under \rm we get the symbols, and under \tt they are suppressed. Of course it 
doesn't matter what fallback font one uses, as long as it has no 
control-character symbols.

1. Can this approach be generalized to get what we want, viz., a way to toggle 
the symbols?


given the inconsistency in what is or is not in a font the only way out 
is to have our own visualization (consistent across fonts) and even then 
it would add some mess because we're talking of a mix of characters that 
can have gone (as part of rendering) or are not characters at all but 
spacing


so, in that case only 'verbatim' is a candidate for visualization, not 
so much typeset text



2. \enabletrackers[typesetters.nbsp] gives a colored box, which is at least 
something.. But how can we get the NBSP symbol that's alerady in the font?


it's gone by that time ... the line break mmechanism uses glue, not 
characters



3. Ideally:
a. we want all Unicode control symbols to show up in verbatim or in \typebuffer 
(as in a text editor);


only there (with some non interfering rendering i guess) and even then 
it's probably an additonal pass over the node list



b. we want all Unicode control symbols to be suppressed in final pdf output 
(for, e.g., printing).


they basically are unless some font features keeps them around which is 
out of our control



But some fonts meant for printing have symbols for Unicode control chars -- 
that poses a challenge.


so an inconsistent mess not worth wasting time on (as this is hobbyism 
only fun can be a motivational factir)



And some fonts meant for verbatim/editing do not have symbols for the control 
chars -- that also poses a challenge.  AlmFixed, of course, has them.

Most minimally decent Arabic fonts have symbols for the Unicode control chars 
as default, including Scheherazade, Amiri, Uthmanic, and Noto Naskh Arabic -- 
all free fonts.

Industry workhorses like Linotype Lotus (Arabic) also have them.


i'm not interested in those .. can't afford them for playing around
purposes .. we only look into commercial fonts if we get a dozen 
unresticted copies for context developers



Uniscribe applications like Notepad/Word allow for toggling in a WYSIWYG 
context -- can't speak for HarfBuzz -- so there is no harm in having explicit 
symbols in the font.


sure, as long as there is no rendering ... they show the input


The upshot is that, for non-Latin scripts, some toggling capability in ConTeXt 
is important to have -- even inescapable for Arabic-script piblishing.


a bit subjectiev arguing -)


Perhaps others who use Arabic-script or Indic, etc., can chime in.. Am hopeful 
that we can figure something out!

sore, but not with 'instant priority' (unless it is some project)

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl