On Wed, 18 Dec 2024 23:29:58 GMT, Phil Race <p...@openjdk.org> wrote:

>> Soft hyphens should never render, regardless of the rendering path used 
>> internally.
>> 
>> This PR does not expand the categorization of "complex" characters in 
>> `FontUtilities` in order to force the use of `TextLayout` rendering code 
>> paths (as was discussed in JBS).
>> 
>> Instead, it takes the existing (limited) format-category checks in 
>> `sun.font.CMap` (a TrueType font helper class), expands it to a more general 
>> / complete default-ignorable check 
>> (`FontUtilities.isDefaultIgnorable(int)`), and then moves these checks out 
>> of `CMap` and up a level into the `CharToGlyphMapper` classes themselves.
>> 
>> The Type1 and TTF glyph mappers have been updated, but the macOS glyph 
>> mapper has not been updated.
>
> One thing this fix does is change the logic which we have today
> FROM
>  - Let the font have first say. If it maps the code point, use what the font 
> provides,
>    the font should know what it is doing. And only map to zero if the font 
> does not support the code point
> TO
> - Never mind what the font says, these are always invisible and zero width.
> 
> 
> I have no easy way to know how this is going to work out in practice.
> Is there a particular reason you adopted this model ?
> Also there are some changes in the code points being checked for.
> Eg
> you added 0x200B ..
> you no longer check for 0x2028-0x2029
> you added a check for 0x2060-0x2069
> 
> The additions probably make sense but I'm not sure why you removed some .. it 
> is not explained.
> Perhaps because unicode does not designate it a "Format" character .. but we 
> should still be treating
> it as invisible.
> 
> It would be helpful to add comments say what the characters are in the range, 
> as there is now a lot !
> If it is a range like 0x2060-0x206F, then a summary of the 16 code points 
> would be enough
> 
> Also, I'd like you to say something about the embedded fonts in the test.
> Superficially, it appears you generated these from scratch using fontforge 
> and the glyphs all display as a simple line.
> But I do need to understand the complete provenance of these.

@prrace, @alisenchung Thanks for your comments!

> What automated checks?

You're right, I had vastly overestimated the scope of the automated tier1 test 
runs on GitHub. I'll keep that in mind.

> you will need to find machines to run all the tests yourself, across all 
> platforms

I was able to get my hands on a Mac, and have updated the Mac-specific code as 
well. I'm now covered on Linux, Windows and Mac. But if I ever need to test on 
AIX, I'll know I took a wrong turn somewhere.

> One thing this fix does is change the logic which we have today [...] TO 
> Never mind what the font says, these are always invisible and zero width

Correct, this is the core of the issue in this bug: there are fonts containing 
glyphs for e.g. soft hyphen, but these glyphs should not be displayed because 
these characters are never displayed in and of themselves. This seems to match 
how HarfBuzz behaves, but the current Java logic does not mirror it, leading to 
inconsistent results depending on the internal code paths used to render or 
measure text.

See the Unicode description of "default-ignorable" characters 
(http://www.unicode.org/L2/L2002/02368-default-ignorable.html):

> Default-ignorable code points are those that should be ignored by default in 
> rendering (unless explicitly supported). They have no visible glyph or 
> advance width in and of themselves, although they may affect the display, 
> positioning, or adornment of adjacent or surrounding characters. [...] This 
> does not imply that default-ignorable code points must always be invisible: 
> an implementation can show a visible glyph on request, such as in a "Show 
> Hidden" mode. A particular use of a "Show Hidden" mode is to show a visible 
> indication of "misplaced" or "ineffectual" formatting codes. [...] There are 
> other characters that have no visible glyphs: the white-space characters. 
> These typically have advance-width, however. The line separation characters 
> such as CR do not clearly exhibit this advance-width since they are always at 
> the end of a line, but  in most GUIs show a visible advance width when 
> selected.

The `FontUtilities.isDefaultIgnorable` function matches the HarfBuzz 
classification, including a few exceptions made for the sake of Uniscribe 
compatibility:

https://github.com/openjdk/jdk/blob/b2811a0ccd9664d11770980c47424ab6723cbbc9/src/java.desktop/share/native/libharfbuzz/hb-unicode.hh#L128

> you added 0x200B

Correct, zero-width space is defined as having the Default_Ignorable_Code_Point 
property, i.e. it should not be displayed (by default) even if the font has a 
glyph for it. See 
https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt (search for 
the "Default_Ignorable_Code_Point" section).

> you no longer check for 0x2028-0x2029
> you added a check for 0x2060-0x2069

Correct, based on the Unicode spec and the corresponding HarfBuzz 
implementation, which I think we want the Java layout code to be consistent 
with.

I think 0x2028-0x2029 in particular (line separator, paragraph separator) are 
considered whitespace and not ignorable (see the last sentence in the quote 
above from the unicode.org website).

> It would be helpful to add comments saying what the characters are in the 
> range

Done.

> Also, I'd like you to say something about the embedded fonts in the test.
> it appears you generated these from scratch using fontforge and the glyphs 
> all display as a simple line

Correct, I've added the script to the corresponding JavaDoc for full 
transparency.

I did have one thought that I wanted to run by you guys: it could be argued 
that `FontUtilities.isDefaultIgnorable` should be native and should just 
delegate to HarfBuzz's `is_default_ignorable`. That's not the approach I 
decided to take, but let me know if you disagree.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22670#issuecomment-2556033513

Reply via email to