[
https://issues.apache.org/jira/browse/FOP-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506786#comment-15506786
]
Simone Rondelli commented on FOP-1969:
--------------------------------------
I see the problem.
{code:java|title=MultiByteFont.java}
private CharSequence mapGlyphsToChars(GlyphSequence gs) {
int ng = gs.getGlyphCount();
CharBuffer cb = CharBuffer.allocate(gs.getUTF16CharacterCount()); \\ <--
Here
int ccMissing = Typeface.NOT_FOUND;
for (int i = 0, n = ng; i < n; i++) {
int gi = gs.getGlyph(i);
int cc = findCharacterFromGlyphIndex(gi); \\ <--Problem
if ((cc == 0) || (cc > 0x10FFFF)) {
cc = ccMissing;
log.warn("Unable to map glyph index " + gi
+ " to Unicode scalar in font '"
+ getFullName() + "', substituting missing character '"
+ (char) cc + "'");
}
if (cc > 0x00FFFF) {
int sh;
int sl;
cc -= 0x10000;
sh = ((cc >> 10) & 0x3FF) + 0xD800;
sl = ((cc >> 0) & 0x3FF) + 0xDC00;
cb.put((char) sh);
cb.put((char) sl);
} else {
cb.put((char) cc);
}
}
cb.flip();
return cb;
}
{code}
In Urdu language one character is mapped to multiple glyphs. This sequence is
enough to make the program crash اآخری. Before my modification the CharBuffer
was initialized in this way: {{CharBuffer.allocate(gs.getGlyphCount();}}. This
cause again a BufferOverflow error when you deal with Surrogate Pairs because
you have one glyph corresponding to multiple characters. This is why I have
changed it to {{CharBuffer.allocate(gs.getUTF16CharacterCount();}}. Which is
not working in this case were a single character is mapped to multiple glyphs.
Now the question is: what is the correct way to count the characters into the
GlyphSequence?
# I could use the GlyphSequence.association list and the content of
GlyphSequence.characters to count the real number of characters that
corresponds to the given glyph sequence. The problem that I can see is that the
{{findCharacterFromGlyphIndex(gi);}} might return a different chars (with
different sizes) from the ones into GlyphSequence.characters.
# Resize the CharBuffer when it gets full
# Put the chars into a List and then into a CharBuffer
Any thoughts?
> Surrogate pairs not treated as single unicode codepoint for display purposes
> ----------------------------------------------------------------------------
>
> Key: FOP-1969
> URL: https://issues.apache.org/jira/browse/FOP-1969
> Project: FOP
> Issue Type: Improvement
> Components: unqualified
> Affects Versions: trunk
> Environment: Operating System: All
> Platform: All
> Reporter: Glenn Adams
> Attachments: Urdu.zip, pcltest.zip, single-byte.zip, testing.fo,
> testing.fo, testing.pdf, testing.pdf, testing.xml, testing.xsl, tiffttc.zip
>
>
> unicode codepoints outside of the BMP (base multilingual plane), i.e., whose
> scalar value is greater than 0xFFFF (65535), are coded as UTF-16 surrogate
> pairs in Java strings, which pair should be treated as a single codepoint for
> the purpose of mapping to a glyph in a font (that supports extra-BMP
> mappings);
> at present, FOP does not correctly handle this case in simple (non complex
> script) rendering paths;
> furthermore, though some support has been added to handle this in the complex
> script rendering path, it has not yet been tested, so is not necessarily
> working there either;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)