On Tue, 19 Aug 2025 14:41:54 GMT, Volker Simonis <simo...@openjdk.org> wrote:

>> ### TL;DR
>> 
>> This is a fix for what I think is a regression since the introduction of 
>> HarfBuzz in JDK 9. The problem is that the algorithm which converts the 
>> glyph vector produced by the layout engine into a corresponding character 
>> vector (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that 
>> "*each glyph maps to a single character*". But this is not true any more 
>> with HarfBuzz and as this example demonstrates, can lead to improper 
>> clustering of characters which can result to bad line breaking decisions.
>> 
>> I ran the corresponding JTreg and JCK test on Linux but because this area is 
>> heavily dependent on the OS and concrete fonts I'd like to kindly ask you to 
>> run your internal test suites in this area if possible.  
>> 
>> In the following you can find a longer (maybe a bit too long :) description 
>> of this problem which I merely wrote for my own memory.
>> 
>> ### Full description
>> 
>> A customer reported a regression in JDK 9+ which leads to bad/wrong line 
>> breaks for text in the Khmer language. Khmer is a [complex 
>> script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to 
>> the Unicode standard 3.0 in 1999 (in the [Unicode block 
>> U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I 
>> personally don't understand Khmer at all :)
>> 
>> Fortunately, the customer could provide a [simple 
>> reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java)
>>  which I could further condense to the following example: 
>> "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means 
>> "*Requested but still denied*"). If we use OpenJDK's 
>> [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html)
>>  to layout that paragraph (notice that Khmer has no spaces between words) to 
>> fit within a specific "wrapping width", the output may look as follows with 
>> JDK 8 (the exact output depends on the font and the wrapping width):
>> 
>> Segment: បានស្នើសុំ 0 10
>> Segment: នៅតែត្រូវ 10 9
>> Segment: បានបដិសេ 19 8
>> Segment: ធ 27 1
>> 
>> I ran with both, the logical 
>> [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG)
>>  font or directly with 
>> `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on 
>> my system DIALOG will automatically fall back to the KhmerOS font for 
>> characters from the Khmer Unicode code block). I also tried with the [Noto 
>> Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) f...
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Added JTreg test to verify monotonically growing glyph character indices

Marked as reviewed by serb (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/26825#pullrequestreview-3142319406

Reply via email to