Hi all, I've taken the OpenJDK plunge and have started to investigate JDK-8270265 ( https://bugs.openjdk.java.net/browse/JDK-8270265). However, I'm very new to the codebase, so I'm looking for some advice and direction. What I've found so far:
Strings containing zero-width non-joiner (ZWNJ, U+200C) characters draw correctly to a Graphics2D -- that is, the ZWNJ chars do not draw at all, even if the font being used contains a glyph for the ZWNJ character (Tahoma, for example contains glyph 744 for this character, with advanceWidth=0 in the hmtx table). Presumably this is handled by HarfBuzz via Java_sun_font_SunLayoutEngine_shape (in HBShaper.c). However, when the same strings are broken into lines with LineBreakMeasurer, the ZWNJ chars are actually presumed to have non-zero advances. As a result, less text is allocated to each line than is actually possible to display, since the LineBreakMeasurer mistakenly thinks that the ZWNJ characters need space to be rendered. The root cause seems to be that the StandardGlyphVector created internally for the LineBreakMeasurer is initialized in such a way that glyph IDs are coming from HarfBuzz, but HarfBuzz is providing the glyph ID for the space character (U+0020, glyph ID 3 in Tahoma) instead of the glyph ID for the ZWNJ character (glyph ID 744 in Tahoma). This means that later when we look up the glyph metrics (to retrieve the glyph advance), we are actually getting the space (U+0020) glyph metrics (hence the non-zero advance). I'm not very familiar with HarfBuzz, but it sounds like this U+0020 substitution is something that is done for "invisible glyphs" ( https://harfbuzz.github.io/setting-buffer-properties.html, https://harfbuzz.github.io/harfbuzz-hb-buffer.html#hb-buffer-set-invisible-glyph). These "invisible glyphs" are identified by _hb_glyph_info_is_default_ignorable ( https://github.com/harfbuzz/harfbuzz/blob/3d48bfc18731e3c2187a5b0666a7e94dcab0150b/src/hb-ot-layout.hh#L320) and seem to be the "Default_Ignorable_Code_Point" code points ( https://unicode.org/reports/tr44/#Default_Ignorable_Code_Point). When this substitution is performed, not only is the glyph replaced, but the advances for that glyph instance are also zeroed out ( https://github.com/harfbuzz/harfbuzz/blob/368e9578873798e2d17ed78a0474dec7d4e9d6c0/src/hb-ot-shape.cc#L829 ). Long story short, the glyph IDs returned by HarfBuzz are not always to be trusted, especially if we want to later use them as a basis for looking up glyph metrics. The code (and console output) below illustrates the issue by creating two (Standard)GlyphVectors in two slightly different ways. The first GV does not get the glyph IDs from HarfBuzz, so is completely correct. The second GV does get the glyph IDs from HarfBuzz, so while the glyph positions match the first GV, the glyph metrics are incorrect. Some options: 1. Continue to use the glyph IDs provided by HarfBuzz, but massage them afterwards: a. Look for space glyphs, check if they were actually space chars or not, or b. Look for space glyphs, check if they contributed zero advance, or c. Look for Default_Ignorable chars (note HarfBuzz code contains a comment "we have a modified Default_Ignorable"...), or d. Use hb_buffer_set_invisible_glyph to explicitly communicate replaced glyphs back to the Java code 2. Stop using HarfBuzz-provided glyph IDs completely, and use the CharToGlyphMapper used by the (correct) Font.createGlyphVector(...) code path 3. Configure HarfBuzz to provide the untransformed glyph IDs (not sure if it's possible, while still preventing the glyphs from displaying) 4. Use the SGV.positions array (which is always correct) to calculate advances... this might fix the LineBreakMeasurer use case, but SGV would remain broken 5. Using the HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES might be an option, though it would still result in SGVs with slightly different glyph ID arrays, depending on how the SGV is created 6. Something else? Please let me know what you think. Does the analysis above have any gaps? What should a fix look like? Happy to answer any questions, research any gaps, or take a stab at a solution that seems promising to the group. Option 5 seems most promising to me, assuming the removal does not prevent behavior triggered by the removed character (i.e. ZWNJ still needs to prevent ligatures even if it is removed), and assuming we are OK with createGlyphVector() and layoutGlyphVector() returning slightly different GVs (but at least internally consistent, and externally consistent from a visual perspective). Take care, Daniel --- public static void main(String... args) throws Exception { String s = "a\u200Cb\u200Cc"; FontRenderContext frc = new FontRenderContext(new AffineTransform(), true, true); Font tahoma = Font.createFont(Font.TRUETYPE_FONT, new File("C:/Windows/Fonts/tahoma.ttf")).deriveFont(50f); GlyphVector gv1 = tahoma.createGlyphVector(frc, s); log(">>> font.createGlyphVector (GOOD)", gv1); // layoutGlyphVector() calls the same methods used internally by LineBreakMeasurer -> TextMeasurer -> ExtendedTextSourceLabel GlyphVector gv2 = tahoma.layoutGlyphVector(frc, s.toCharArray(), 0, 5, 0); log(">>> font.layoutGlyphVector (BAD)", gv2); } private static void log(String name, GlyphVector gv) { System.out.println(name); int glyphs = gv.getNumGlyphs(); float[] positions = gv.getGlyphPositions(0, glyphs, null); System.out.println("positions: " + Arrays.toString(positions)); int[] gids = gv.getGlyphCodes(0, glyphs, null); System.out.println("glyph IDs: " + Arrays.toString(gids)); float[] advances = new float[glyphs]; for (int i = 0; i < glyphs; i++) { advances[i] = gv.getGlyphMetrics(i).getAdvanceX(); } System.out.println("advances: " + Arrays.toString(advances)); System.out.println(); } >>> font.createGlyphVector (GOOD) positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0, 53.881836, 0.0] glyph IDs: [68, 744, 69, 744, 70] advances: [26.245117, 0.0, 27.636719, 0.0, 23.07129] >>> font.layoutGlyphVector (BAD) positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0, 53.881836, 0.0] glyph IDs: [68, 3, 69, 3, 70] advances: [26.245117, 15.625, 27.636719, 15.625, 23.07129]