On Tue, 19 Aug 2025 14:41:54 GMT, Volker Simonis <simo...@openjdk.org> wrote:
>> ### TL;DR >> >> This is a fix for what I think is a regression since the introduction of >> HarfBuzz in JDK 9. The problem is that the algorithm which converts the >> glyph vector produced by the layout engine into a corresponding character >> vector (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that >> "*each glyph maps to a single character*". But this is not true any more >> with HarfBuzz and as this example demonstrates, can lead to improper >> clustering of characters which can result to bad line breaking decisions. >> >> I ran the corresponding JTreg and JCK test on Linux but because this area is >> heavily dependent on the OS and concrete fonts I'd like to kindly ask you to >> run your internal test suites in this area if possible. >> >> In the following you can find a longer (maybe a bit too long :) description >> of this problem which I merely wrote for my own memory. >> >> ### Full description >> >> A customer reported a regression in JDK 9+ which leads to bad/wrong line >> breaks for text in the Khmer language. Khmer is a [complex >> script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to >> the Unicode standard 3.0 in 1999 (in the [Unicode block >> U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I >> personally don't understand Khmer at all :) >> >> Fortunately, the customer could provide a [simple >> reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) >> which I could further condense to the following example: >> "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means >> "*Requested but still denied*"). If we use OpenJDK's >> [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html) >> to layout that paragraph (notice that Khmer has no spaces between words) to >> fit within a specific "wrapping width", the output may look as follows with >> JDK 8 (the exact output depends on the font and the wrapping width): >> >> Segment: បានស្នើសុំ 0 10 >> Segment: នៅតែត្រូវ 10 9 >> Segment: បានបដិសេ 19 8 >> Segment: ធ 27 1 >> >> I ran with both, the logical >> [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG) >> font or directly with >> `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on >> my system DIALOG will automatically fall back to the KhmerOS font for >> characters from the Khmer Unicode code block). I also tried with the [Noto >> Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) f... > > Volker Simonis has updated the pull request incrementally with one additional > commit since the last revision: > > Added JTreg test to verify monotonically growing glyph character indices I've now added a test which checks that for a given Khmer string, we will not break glyph clusters. It does this will all system fonts which can fully display the given example string at various wrapping widths. It fails without the fix from this PR and succeeds with this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26825#issuecomment-3225089616