I can take a stab at this but it probably needs the ICU developer to confirm
what's happening there.
Keith Stribley wrote:
I am interested in getting clig,liga,mark,mkmk,kern OpenType tables to
Those are not tables. Those are features in the OpenType GSUB and GPOS tables.
be processed by the OpenJDK layout engine for the Myanmar code block.
Currently Unicode 5.1 Myanmar fonts cannot be used with Java AWT/Swing.
I noticed that the layout engine code in OpenJDK is essentially an old
version of the ICU layout engine and ICU is capable of rendering Myanmar
Unicode 5.1 compliant fonts such as Myanmar3 and Padauk correctly.
FYI: it was "current" at the point in JDK 6 development when it was integrated.
JDK 7 will get an updated version in due course.
The first step was to make sun.font.FontManager.isComplexCharCode()
return true for the Myanmar range. However, I then needed to modify the
sun.font.GlyphLayout.EngineRecord. This has an eflags fields which is
passed to ICU.
I'm not quite sure why 0x4 is used as the value when there are marks, I
believe it corresponds to "no canonical processing", though I don't know
why that is needed.
I think you have this backwards. 0x4 means do canonical processing
and its there for performance. ie if its not set then we can skip
a lot of work. I don't recall (at all) how much that was but I
suspect it was significant.
More seriously, this does not trigger ICU kerning or
ligatures.
this.eflags needs to be set to 0x3 for this. 1=kerning, 2=ligatures (see
http://www.icu-project.org/apiref/icu4c/classLayoutEngine.html#cee4ea27f3211be215ea9b9bd3a91c32)
No, I believe that comes from _typo_flags.
My question is therefore, why aren't kerning and ligatures turned on, at
least for complex scripts. I've noticed that with Latin text that if you
set TextAttribute.KERNING and TextAttribute.LIGATURES ligatures work for
non-complex text e.g. ffi with DoulosSIL, but if you have a mark in the
text, ligatures stop working, though the mark attaches correctly. I
would therefore have thought that there is little to be lost from using
eflags = 0x3 in all the cases where eflags is set. I guess there might
be a slight speed drop, but is it still significant these days? Is there
a specific reason why kerning and ligatures haven't been enabled in ICU
when used in the JDK? Does it have some unexpected side affect?
I think the basic reason is compatibility of text advance.
Text that is rendered through drawString() and text that is rendered
via TextLayout() should be the same.
So optional ligatures and kerning need to be requested by those
who know they want them.
You might then ask but why not at least do this for complex
scripts where text has to go through layout and mandatory ligatures
are performed. I would have to dig to be sure what actually happens
in ICU, but one scenario is mixed script text. Eg some latin followed
by some complex script. If the optional ligatures were performed by
layout and you are in say a text editor and delete the complex
text leaving only the latin text it would look odd if the optional
ligatures no longer formed and if kerning stopped being applied.
However if you are pointing out that even when specifying
TextAttribute.KERNING and TextAttribute.LIGATURES that they do not
get applied, then that would seem like a bug. But my reading of
the code is that that the request for kerning and ligatures is
not held in "eflags" but in "_typo_flags" and the value
passed down to layout is "_typo_flags | eflags"
As far as I can see your patch is equivalent to always
adding the TextAttribute.KERNING and TextAttribute.LIGATURES
as attributes on these two fonts (no JDK source code changes
needed). Is that what you see?
Currently EngineRecord only sets eflags for NON_SPACING_MARK,
ENCLOSING_MARK, COMBINING_SPACING_MARK.
That is I believe for performance.
At the moment, this isn't
sufficient for Burmese since the character properties in the jdk haven't
been updated to Unicode 5.1, hence I enabled it for the whole code block
in my test build.
For reference, Myanmar fonts are available at:
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=Padauk
http://myanmarnlpteam.blogspot.com/2007/08/download-links.html
http://www.mymyanmar.net/2g/
(Another Myanmar font, Parabaik uses OpenType rlig, which ICU doesn't
process for this code block without further code changes).
There is a possible patch below, which displays Unicode 5.1 Myanmar
correctly with Padauk, MyMyanmar Unicode and Myanmar3 fonts when used
with the methods TextLayout.draw, drawString and drawChars in
Font2DTest. Some attached marks get lost with Padauk using
TextLayout.getOutline+draw.
I would appreciate feedback on whether to submit this as a patch purely
for the Myanmar script or whether eflags should be changed more generally.
Before we can accept any patch you will need to sign and submit
the Sun Contributor Agreement. See http://openjdk.java.net/contribute/
Regards,
Keith Stribley
--- ./jdk/src/share/classes/sun/font/GlyphLayout.java.orig 2008-05-29
15:01:33.000000000 +0100
+++ ./jdk/src/share/classes/sun/font/GlyphLayout.java 2008-05-29
23:13:26.000000000 +0100
@@ -644,11 +644,15 @@
ch = toCodePoint((char)ch,_textRecord.text[++i]);
// inc
}
int gc = getType(ch);
+ if (script == 28) { // Myanmar - see LEScripts.h
+ this.eflags = 0x3;// 1=kerning, 2=ligatures
+ break;
+ }
if (gc == NON_SPACING_MARK ||
gc == ENCLOSING_MARK ||
gc == COMBINING_SPACING_MARK) { // could do range
test also
- this.eflags = 0x4;
+ this.eflags = 0x4; // 4 = no canonical processing,
but would 0x3 be better?
I think you have this backwards. 0x4 means DO canonical processing.
break;
}
}
--- ./jdk/src/share/classes/sun/font/FontManager.java.orig 2008-05-28
12:46:03.000000000 +0100
+++ ./jdk/src/share/classes/sun/font/FontManager.java 2008-05-29
21:33:31.000000000 +0100
@@ -3594,6 +3594,12 @@
// 0E00 - 0E7F if Thai, assume shaping for vowel, tone marks
return true;
}
+ else if (code < 0x1000) {
+ return false;
+ }
+ else if (code < 0x10A0) { // 1000-109F Myanmar
+ return true;
+ }
else if (code < 0x1780) {
return false;
}
-phil.