[jira] [Commented] (FOP-1969) Surrogate pairs not treated as single unicode codepoint for display purposes

Glenn Adams (JIRA) Tue, 20 Sep 2016 21:43:10 -0700

    [ 
https://issues.apache.org/jira/browse/FOP-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508721#comment-15508721
 ]


Glenn Adams commented on FOP-1969:
----------------------------------

It appears I introduced this code in:

r1293736 | gadams | 2012-02-26 02:29:01 +0000 (Sun, 26 Feb 2012) | 1 line

http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/fonts/MultiByteFont.java?limit_changes=0&r1=1293736&r2=1293735&pathrev=1293736

I don't have a direct recollection of the rationality for using 
findCharacterFromGlyphIndex instead of using the GlyphSequence, but I would 
speculate that it is because the chars in the GS correspond to the original 
input characters while the font's reverse mapping from glyph indices to 
characters include dynamically generated character codes (assigned to the PUA) 
when a glyph index is not associated with a standard Unicode character in the 
CMAP.

For each font instance, new character codes from the PUA are dynamically 
assigned when a reverse mapping can't be found in the CMAP.

However, I would have to run some tests through a debugger to verify this case. 
My guess is that if you change this code to use the GS input chars, then it 
will break things in such a scenario.



> Surrogate pairs not treated as single unicode codepoint for display purposes
> ----------------------------------------------------------------------------
>
>                 Key: FOP-1969
>                 URL: https://issues.apache.org/jira/browse/FOP-1969
>             Project: FOP
>          Issue Type: Improvement
>          Components: unqualified
>    Affects Versions: trunk
>         Environment: Operating System: All
> Platform: All
>            Reporter: Glenn Adams
>         Attachments: Urdu.zip, pcltest.zip, single-byte.zip, testing.fo, 
> testing.fo, testing.pdf, testing.pdf, testing.xml, testing.xsl, tiffttc.zip
>
>
> unicode codepoints outside of the BMP (base multilingual plane), i.e., whose 
> scalar value is greater than 0xFFFF (65535), are coded as UTF-16 surrogate 
> pairs in Java strings, which pair should be treated as a single codepoint for 
> the purpose of mapping to a glyph in a font (that supports extra-BMP 
> mappings);
> at present, FOP does not correctly handle this case in simple (non complex 
> script) rendering paths;
> furthermore, though some support has been added to handle this in the complex 
> script rendering path, it has not yet been tested, so is not necessarily 
> working there either;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FOP-1969) Surrogate pairs not treated as single unicode codepoint for display purposes

Reply via email to