> ### TL;DR
> 
> This is a fix for what I think is a regression since the introduction of 
> HarfBuzz in JDK 9. The problem is that the algorithm which converts the glyph 
> vector produced by the layout engine into a corresponding character vector 
> (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that "*each 
> glyph maps to a single character*". But this is not true any more with 
> HarfBuzz and as this example demonstrates, can lead to improper clustering of 
> characters which can result to bad line breaking decisions.
> 
> I ran the corresponding JTreg and JCK test on Linux but because this area is 
> heavily dependent on the OS and concrete fonts I'd like to kindly ask you to 
> run your internal test suites in this area if possible.  
> 
> In the following you can find a longer (maybe a bit too long :) description 
> of this problem which I merely wrote for my own memory.
> 
> ### Full description
> 
> A customer reported a regression in JDK 9+ which leads to bad/wrong line 
> breaks for text in the Khmer language. Khmer is a [complex 
> script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to 
> the Unicode standard 3.0 in 1999 (in the [Unicode block 
> U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I 
> personally don't understand Khmer at all :)
> 
> Fortunately, the customer could provide a [simple 
> reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) 
> which I could further condense to the following example: 
> "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means 
> "*Requested but still denied*"). If we use OpenJDK's 
> [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html)
>  to layout that paragraph (notice that Khmer has no spaces between words) to 
> fit within a specific "wrapping width", the output may look as follows with 
> JDK 8 (the exact output depends on the font and the wrapping width):
> 
> Segment: បានស្នើសុំ 0 10
> Segment: នៅតែត្រូវ 10 9
> Segment: បានបដិសេ 19 8
> Segment: ធ 27 1
> 
> I ran with both, the logical 
> [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG)
>  font or directly with 
> `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on 
> my system DIALOG will automatically fall back to the KhmerOS font for 
> characters from the Khmer Unicode code block). I also tried with the [Noto 
> Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) fonts but the 
> results were similar, so I'...

Volker Simonis has updated the pull request incrementally with one additional 
commit since the last revision:

  No need to count 'clusterExtraGlyphs' any more

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26825/files
  - new: https://git.openjdk.org/jdk/pull/26825/files/a52916b2..ba3e50b2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26825&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26825&range=00-01

  Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26825.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26825/head:pull/26825

PR: https://git.openjdk.org/jdk/pull/26825

Reply via email to