[ https://issues.apache.org/jira/browse/FOP-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506786#comment-15506786 ]
Simone Rondelli commented on FOP-1969: -------------------------------------- I see the problem. {code:java|title=MultiByteFont.java} private CharSequence mapGlyphsToChars(GlyphSequence gs) { int ng = gs.getGlyphCount(); CharBuffer cb = CharBuffer.allocate(gs.getUTF16CharacterCount()); \\ <-- Here int ccMissing = Typeface.NOT_FOUND; for (int i = 0, n = ng; i < n; i++) { int gi = gs.getGlyph(i); int cc = findCharacterFromGlyphIndex(gi); \\ <--Problem if ((cc == 0) || (cc > 0x10FFFF)) { cc = ccMissing; log.warn("Unable to map glyph index " + gi + " to Unicode scalar in font '" + getFullName() + "', substituting missing character '" + (char) cc + "'"); } if (cc > 0x00FFFF) { int sh; int sl; cc -= 0x10000; sh = ((cc >> 10) & 0x3FF) + 0xD800; sl = ((cc >> 0) & 0x3FF) + 0xDC00; cb.put((char) sh); cb.put((char) sl); } else { cb.put((char) cc); } } cb.flip(); return cb; } {code} In Urdu language one character is mapped to multiple glyphs. This sequence is enough to make the program crash اآخری. Before my modification the CharBuffer was initialized in this way: {{CharBuffer.allocate(gs.getGlyphCount();}}. This cause again a BufferOverflow error when you deal with Surrogate Pairs because you have one glyph corresponding to multiple characters. This is why I have changed it to {{CharBuffer.allocate(gs.getUTF16CharacterCount();}}. Which is not working in this case were a single character is mapped to multiple glyphs. Now the question is: what is the correct way to count the characters into the GlyphSequence? # I could use the GlyphSequence.association list and the content of GlyphSequence.characters to count the real number of characters that corresponds to the given glyph sequence. The problem that I can see is that the {{findCharacterFromGlyphIndex(gi);}} might return a different chars (with different sizes) from the ones into GlyphSequence.characters. # Resize the CharBuffer when it gets full # Put the chars into a List and then into a CharBuffer Any thoughts? > Surrogate pairs not treated as single unicode codepoint for display purposes > ---------------------------------------------------------------------------- > > Key: FOP-1969 > URL: https://issues.apache.org/jira/browse/FOP-1969 > Project: FOP > Issue Type: Improvement > Components: unqualified > Affects Versions: trunk > Environment: Operating System: All > Platform: All > Reporter: Glenn Adams > Attachments: Urdu.zip, pcltest.zip, single-byte.zip, testing.fo, > testing.fo, testing.pdf, testing.pdf, testing.xml, testing.xsl, tiffttc.zip > > > unicode codepoints outside of the BMP (base multilingual plane), i.e., whose > scalar value is greater than 0xFFFF (65535), are coded as UTF-16 surrogate > pairs in Java strings, which pair should be treated as a single codepoint for > the purpose of mapping to a glyph in a font (that supports extra-BMP > mappings); > at present, FOP does not correctly handle this case in simple (non complex > script) rendering paths; > furthermore, though some support has been added to handle this in the complex > script rendering path, it has not yet been tested, so is not necessarily > working there either; -- This message was sent by Atlassian JIRA (v6.3.4#6332)