On Thu, 3 Apr 2025 11:23:42 GMT, Daniel Gredler <dgred...@openjdk.org> wrote:
> It looks like this regression actually fits into a longer series of fixes / > regressions in this area: > > - [JDK-4517298](https://bugs.openjdk.org/browse/JDK-4517298) fixed metrics > for zero-width characters, but broke some ligatures / glyph substitutions > - [JDK-7017058](https://bugs.openjdk.org/browse/JDK-7017058) fixed the > ligatures / glyph substitutions, but broke some zero-width metrics > - [JDK-8208377](https://bugs.openjdk.org/browse/JDK-8208377) fixed some > metrics and rendering for zero-width characters, but broke some ligatures / > glyph substitutions > - Now, with this PR, we aim to fix the ligatures without re-breaking > zero-width metrics and display > > We have two different types of use cases pulling `CharToGlyphMapper` in two > different directions: the users who need raw, untransformed glyph info, and > the users who need normalized / transformed glyph info. > > It looks to me like, in the current code base, the only `CharToGlyphMapper` > user which requires raw font data is HarfBuzz (explicitly confirmed with the > HarfBuzz team here: https://github.com/harfbuzz/harfbuzz/discussions/5234). > > The regression mechanism at play here is that the HarfBuzz font callbacks are > currently providing HarfBuzz with transformed glyph info (e.g. ZWJ -> > INVISIBLE_GLYPH_ID), which prevents HarfBuzz from recognizing and applying > the correct font GSUB substitutions (which involve ZWJ). > > In order to fix this without (yet again) breaking metrics and display > behavior elsewhere, I've added two methods to `CharToGlyphMapper` which > provide access to raw glyph info, to be used by the HarfBuzz font callbacks: > `charToGlyphRaw(int)` and `charToVariationGlyphRaw(int)`. > > Note two intricacies related to `CompositeGlyphMapper`: > 1. We need to be careful to only cache raw (untransformed) values, to avoid > conflicts between requests for a raw version of a glyph and a transformed > version of the same glyph. Another option would have been two separate > caches, but I don't think that's necessary. > 2. Consumers who are using `CompositeGlyphMapper.SLOTMASK` to check glyph > slots (e.g. `FontRunIterator` and `CTextPipe`) will "see" invisible glyphs as > having come from slot 0. This isn't new, and I think it's OK, but something > to be aware of. > > The glyph cache handling in `CCharToGlyphMapper` (for macOS) also requires > care to avoid mixing value types. > > Please also note that I'm not sure if the tweak to `sunFont.c` is being > tested, since FFM is being used by default for HarfBuzz integration. (Is > there a plan to remove... I was talking about the explosion because there is a scenario in my mind, which I didn't make clear for everybody else. There is a change which I didn't have time to contribute, but would like to: it's related to composite fonts and variation selectors. We may need 2 variants for retrieving a glyph with a variation selector - one strictly matching a variation selector and another with a fallback to the base glyph, multiplied by raw/transformed versions, which adds 2 more methods. Not like it's a big problem, but given that they all end up calling a single method anyway... You get the point. > there may be other scenarios in the future, e.g. \r, \n and \t which > currently are handled elsewhere). Are those scenarios specific to a patricular mapper/font type? I was thinking that those transformations are generic. > Any ideas for what this refactoring might look like? I was thinking about moving this default-ignorable or any potential generic transformation into base `CharToGlyphMapper` or even `Font2D`. For example, make default implementation of `CharToGlyphMapper.charToGlyph` check ignorable characters and then call `charToGlyphRaw` - then other implementations would only need to override `charToGlyphRaw`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24412#issuecomment-2845865987