On Tue, 29 Apr 2025 09:55:56 GMT, Nikita Gubarkov <ngubar...@openjdk.org> wrote:

>> It looks like this regression actually fits into a longer series of fixes / 
>> regressions in this area:
>> 
>> - [JDK-4517298](https://bugs.openjdk.org/browse/JDK-4517298) fixed metrics 
>> for zero-width characters, but broke some ligatures / glyph substitutions
>> - [JDK-7017058](https://bugs.openjdk.org/browse/JDK-7017058) fixed the 
>> ligatures / glyph substitutions, but broke some zero-width metrics
>> - [JDK-8208377](https://bugs.openjdk.org/browse/JDK-8208377) fixed some 
>> metrics and rendering for zero-width characters, but broke some ligatures / 
>> glyph substitutions
>> - Now, with this PR, we aim to fix the ligatures without re-breaking 
>> zero-width metrics and display
>> 
>> We have two different types of use cases pulling `CharToGlyphMapper` in two 
>> different directions: the users who need raw, untransformed glyph info, and 
>> the users who need normalized / transformed glyph info.
>> 
>> It looks to me like, in the current code base, the only `CharToGlyphMapper` 
>> user which requires raw font data is HarfBuzz (explicitly confirmed with the 
>> HarfBuzz team here: https://github.com/harfbuzz/harfbuzz/discussions/5234).
>> 
>> The regression mechanism at play here is that the HarfBuzz font callbacks 
>> are currently providing HarfBuzz with transformed glyph info (e.g. ZWJ -> 
>> INVISIBLE_GLYPH_ID), which prevents HarfBuzz from recognizing and applying 
>> the correct font GSUB substitutions (which involve ZWJ).
>> 
>> In order to fix this without (yet again) breaking metrics and display 
>> behavior elsewhere, I've added two methods to `CharToGlyphMapper` which 
>> provide access to raw glyph info, to be used by the HarfBuzz font callbacks: 
>> `charToGlyphRaw(int)` and `charToVariationGlyphRaw(int)`.
>> 
>> Note two intricacies related to `CompositeGlyphMapper`:
>> 1. We need to be careful to only cache raw (untransformed) values, to avoid 
>> conflicts between requests for a raw version of a glyph and a transformed 
>> version of the same glyph. Another option would have been two separate 
>> caches, but I don't think that's necessary.
>> 2. Consumers who are using `CompositeGlyphMapper.SLOTMASK` to check glyph 
>> slots (e.g. `FontRunIterator` and `CTextPipe`) will "see" invisible glyphs 
>> as having come from slot 0. This isn't new, and I think it's OK, but 
>> something to be aware of.
>> 
>> The glyph cache handling in `CCharToGlyphMapper` (for macOS) also requires 
>> care to avoid mixing value types.
>> 
>> Please also note that I'm not sure if the tweak to `sunFont.c` is being 
>> tested, since FFM is being used by default for Harf...
>
> By the way, I see that in each implementation, both `charToGlyph` and 
> `charToGlyphRaw` call a common method, like `getGlyph(int uniciode, boolean 
> raw)`. At first there was just `charToGlyph`, then `charToVariationGlyph` was 
> added and now you added a "raw" version for each of them, I see that in the 
> future we will need other variants and how it's already starting an 
> exponential explosion. Overriding all of those methods in each implementation 
> brings quite a bit of boilerplate, and it becomes easier to miss something. 
> Maybe take a step back and refactor this into a single `charToGlyph(int 
> unicode, int variationSelector, boolean raw)` version?
> Also, this `raw` parameter only really controls `isDefaultIgnorable` check in 
> the end of each method. Maybe we could factor this out without bringing it 
> separately into each mapper implementation?

@YaaZ: Thanks for the additional feedback, please see my thoughts below:

> By the way, I see that in each implementation, both charToGlyph and 
> charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean 
> raw). At first there was just charToGlyph, then charToVariationGlyph was 
> added and now you added a "raw" version for each of them, I see that in the 
> future we will need other variants and how it's already starting an 
> exponential explosion.

I don't know if I would call two changes to `CharToGlyphMapper` in 20 years an 
exponential explosion, but I get your point :-)

> Overriding all of those methods in each implementation brings quite a bit of 
> boilerplate, and it becomes easier to miss something.

True, but again keep in mind that there are only 5 implementations, only one of 
which (the macOS `CCharToGlyphMapper`) has been added in the last 20 years.

> Maybe take a step back and refactor this into a single charToGlyph(int 
> unicode, int variationSelector, boolean raw) version?

We'd still need separate methods for `int` vs. `char`, but I think this might 
reduce 5 methods down to 3? The changeset would be a bit more intrusive (lots 
of callers would need to change to reflect the new method signature). I'd be 
interested to hear thoughts from some of the reviewers on this one.

> Also, this raw parameter only really controls isDefaultIgnorable check in the 
> end of each method. Maybe we could factor this out without bringing it 
> separately into each mapper implementation?

I prefer to think of it as controlling whether or not any transformations to 
`INVISIBLE_GLYPH_ID` happen (right now it's just for default-ignorable 
characters, but there may be other scenarios in the future, e.g. `\r`, `\n` and 
`\t` which currently are handled elsewhere).

Any ideas for what this refactoring might look like?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24412#issuecomment-2845646011

Reply via email to