I am interested in getting clig,liga,mark,mkmk,kern OpenType tables to be processed by the OpenJDK layout engine for the Myanmar code block. Currently Unicode 5.1 Myanmar fonts cannot be used with Java AWT/Swing.
I noticed that the layout engine code in OpenJDK is essentially an old version of the ICU layout engine and ICU is capable of rendering Myanmar Unicode 5.1 compliant fonts such as Myanmar3 and Padauk correctly. The first step was to make sun.font.FontManager.isComplexCharCode() return true for the Myanmar range. However, I then needed to modify the sun.font.GlyphLayout.EngineRecord. This has an eflags fields which is passed to ICU. I'm not quite sure why 0x4 is used as the value when there are marks, I believe it corresponds to "no canonical processing", though I don't know why that is needed. More seriously, this does not trigger ICU kerning or ligatures. this.eflags needs to be set to 0x3 for this. 1=kerning, 2=ligatures (see http://www.icu-project.org/apiref/icu4c/classLayoutEngine.html#cee4ea27f3211be215ea9b9bd3a91c32) My question is therefore, why aren't kerning and ligatures turned on, at least for complex scripts. I've noticed that with Latin text that if you set TextAttribute.KERNING and TextAttribute.LIGATURES ligatures work for non-complex text e.g. ffi with DoulosSIL, but if you have a mark in the text, ligatures stop working, though the mark attaches correctly. I would therefore have thought that there is little to be lost from using eflags = 0x3 in all the cases where eflags is set. I guess there might be a slight speed drop, but is it still significant these days? Is there a specific reason why kerning and ligatures haven't been enabled in ICU when used in the JDK? Does it have some unexpected side affect? Currently EngineRecord only sets eflags for NON_SPACING_MARK, ENCLOSING_MARK, COMBINING_SPACING_MARK. At the moment, this isn't sufficient for Burmese since the character properties in the jdk haven't been updated to Unicode 5.1, hence I enabled it for the whole code block in my test build. For reference, Myanmar fonts are available at: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=Padauk http://myanmarnlpteam.blogspot.com/2007/08/download-links.html http://www.mymyanmar.net/2g/ (Another Myanmar font, Parabaik uses OpenType rlig, which ICU doesn't process for this code block without further code changes). There is a possible patch below, which displays Unicode 5.1 Myanmar correctly with Padauk, MyMyanmar Unicode and Myanmar3 fonts when used with the methods TextLayout.draw, drawString and drawChars in Font2DTest. Some attached marks get lost with Padauk using TextLayout.getOutline+draw. I would appreciate feedback on whether to submit this as a patch purely for the Myanmar script or whether eflags should be changed more generally. Regards, Keith Stribley --- ./jdk/src/share/classes/sun/font/GlyphLayout.java.orig 2008-05-29 15:01:33.000000000 +0100 +++ ./jdk/src/share/classes/sun/font/GlyphLayout.java 2008-05-29 23:13:26.000000000 +0100 @@ -644,11 +644,15 @@ ch = toCodePoint((char)ch,_textRecord.text[++i]); // inc } int gc = getType(ch); + if (script == 28) { // Myanmar - see LEScripts.h + this.eflags = 0x3;// 1=kerning, 2=ligatures + break; + } if (gc == NON_SPACING_MARK || gc == ENCLOSING_MARK || gc == COMBINING_SPACING_MARK) { // could do range test also - this.eflags = 0x4; + this.eflags = 0x4; // 4 = no canonical processing, but would 0x3 be better? break; } } --- ./jdk/src/share/classes/sun/font/FontManager.java.orig 2008-05-28 12:46:03.000000000 +0100 +++ ./jdk/src/share/classes/sun/font/FontManager.java 2008-05-29 21:33:31.000000000 +0100 @@ -3594,6 +3594,12 @@ // 0E00 - 0E7F if Thai, assume shaping for vowel, tone marks return true; } + else if (code < 0x1000) { + return false; + } + else if (code < 0x10A0) { // 1000-109F Myanmar + return true; + } else if (code < 0x1780) { return false; }
