On Mon, 18 Aug 2025 16:05:24 GMT, Volker Simonis <simo...@openjdk.org> wrote:

> ### TL;DR
> 
> This is a fix for what I think is a regression since the introduction of 
> HarfBuzz in JDK 9. The problem is that the algorithm which converts the glyph 
> vector produced by the layout engine into a corresponding character vector 
> (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that "*each 
> glyph maps to a single character*". But this is not true any more with 
> HarfBuzz and as this example demonstrates, can lead to improper clustering of 
> characters which can result to bad line breaking decisions.
> 
> I ran the corresponding JTreg and JCK test on Linux but because this area is 
> heavily dependent on the OS and concrete fonts I'd like to kindly ask you to 
> run your internal test suites in this area if possible.  
> 
> In the following you can find a longer (maybe a bit too long :) description 
> of this problem which I merely wrote for my own memory.
> 
> ### Full description
> 
> A customer reported a regression in JDK 9+ which leads to bad/wrong line 
> breaks for text in the Khmer language. Khmer is a [complex 
> script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to 
> the Unicode standard 3.0 in 1999 (in the [Unicode block 
> U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I 
> personally don't understand Khmer at all :)
> 
> Fortunately, the customer could provide a [simple 
> reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) 
> which I could further condense to the following example: 
> "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means 
> "*Requested but still denied*"). If we use OpenJDK's 
> [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html)
>  to layout that paragraph (notice that Khmer has no spaces between words) to 
> fit within a specific "wrapping width", the output may look as follows with 
> JDK 8 (the exact output depends on the font and the wrapping width):
> 
> Segment: បានស្នើសុំ 0 10
> Segment: នៅតែត្រូវ 10 9
> Segment: បានបដិសេ 19 8
> Segment: ធ 27 1
> 
> I ran with both, the logical 
> [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG)
>  font or directly with 
> `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on 
> my system DIALOG will automatically fall back to the KhmerOS font for 
> characters from the Khmer Unicode code block). I also tried with the [Noto 
> Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) fonts but the 
> results were similar, so I'...

As @mrserb correctly mentioned, there's now no need to count 
`clusterExtraGlyphs` any more, so I've removed it completely from the code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26825#issuecomment-3200747024

Reply via email to