Hello, I'd like to propose a patch for JDK-8152680 - an issue I've raised via bugreport.java.com earlier, hope someone can sponsor it. I have a Contributor status via agreement signed by Jetbrains.
The issue is related to the extraction of glyph-to-character mapping from results of text layout, performed by Harfbuzz, when layout is requested for a specified substring of text string. For LTR case, existing code assumes that cluster values (which are later interpreted as character indices) are assigned by Harfbuzz starting from the beginning of substring, but actually it's done starting from the beginning of whole string (as mentioned by existing comment in HBShaper.c). For reference, this logic can be found at hb-buffer.cc:1470 (function hb_buffer_add_utf). The proposed fix is to take into account this numbering scheme by shifting cluster value correspondingly. GlyphCharIndicesTest test case is included to cover this fix. RTL case is not affected by the problem mentioned above, but there's another issue with it - cluster value generated by Harfbuzz is ignored completely, instead character index is taken to be equal to glyph index (in visual order). This will not work, e.g. in case when there are more glyphs than characters - some glyphs will reference non-existing characters. The proposed fix is just to use the same shifted cluster value, as for LTR case - I believe it works just as well in RTL case. GlyphCharIndicesRtlTest test case is included to cover RTL case, but I'm not sure it should be definitely committed, as it depends on a specific Windows font, which doesn't seem to be available by default in Windows 10 (even though it must be present in Windows Vista, 7 and 8). I couldn't find a better font on Windows, demonstrating the issue. Webrev for the fix is available at http://cr.openjdk.java.net/~avu/DmitryBatrak/JDK-8152680 (kindly posted by my colleague, having access to cr.openjdk.java.net). Best regards, Dmitry Batrak