On Tue, 19 Aug 2025 14:41:54 GMT, Volker Simonis <simo...@openjdk.org> wrote:
>> ### TL;DR >> >> This is a fix for what I think is a regression since the introduction of >> HarfBuzz in JDK 9. The problem is that the algorithm which converts the >> glyph vector produced by the layout engine into a corresponding character >> vector (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that >> "*each glyph maps to a single character*". But this is not true any more >> with HarfBuzz and as this example demonstrates, can lead to improper >> clustering of characters which can result to bad line breaking decisions. >> >> I ran the corresponding JTreg and JCK test on Linux but because this area is >> heavily dependent on the OS and concrete fonts I'd like to kindly ask you to >> run your internal test suites in this area if possible. >> >> In the following you can find a longer (maybe a bit too long :) description >> of this problem which I merely wrote for my own memory. >> >> ### Full description >> >> A customer reported a regression in JDK 9+ which leads to bad/wrong line >> breaks for text in the Khmer language. Khmer is a [complex >> script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to >> the Unicode standard 3.0 in 1999 (in the [Unicode block >> U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I >> personally don't understand Khmer at all :) >> >> Fortunately, the customer could provide a [simple >> reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) >> which I could further condense to the following example: >> "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means >> "*Requested but still denied*"). If we use OpenJDK's >> [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html) >> to layout that paragraph (notice that Khmer has no spaces between words) to >> fit within a specific "wrapping width", the output may look as follows with >> JDK 8 (the exact output depends on the font and the wrapping width): >> >> Segment: បានស្នើសុំ 0 10 >> Segment: នៅតែត្រូវ 10 9 >> Segment: បានបដិសេ 19 8 >> Segment: ធ 27 1 >> >> I ran with both, the logical >> [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG) >> font or directly with >> `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on >> my system DIALOG will automatically fall back to the KhmerOS font for >> characters from the Khmer Unicode code block). I also tried with the [Noto >> Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) f... > > Volker Simonis has updated the pull request incrementally with one additional > commit since the last revision: > > Added JTreg test to verify monotonically growing glyph character indices Marked as reviewed by serb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26825#pullrequestreview-3142319406