[jira] [Commented] (FOP-1969) Surrogate pairs not treated as single unicode codepoint for display purposes

ASF GitHub Bot (JIRA) Mon, 19 Sep 2016 09:23:36 -0700

    [ 
https://issues.apache.org/jira/browse/FOP-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503903#comment-15503903
 ]


ASF GitHub Bot commented on FOP-1969:
-------------------------------------

GitHub user monejava opened a pull request:

    https://github.com/apache/fop/pull/3

    FOP-1969: Surrogate pairs not treated as single unicode codepoint for…

    Implemented correct handling of surrogate pairs in ApacheFOP. The supported 
Renderes are PDF, PS and PNG. Tests implemented when it was possible. 
    
    Here a brief explanation of the design choice that I have made to modify 
the public API:
    
    `mapChar(char)`/`hasChar(char)`: are defined in `Typeface` which means that 
they have more then 20 implementations. Modify this interface would require lot 
of work and might introduce lot of bugs. That's why Glenn Adams (our contact in 
ApacheFOP project) asked us to create new methods rather the existing ones. In 
some of these implementations, such as `SingleByteFont`, is semantically 
correct to have a character represented by a single UTF-16 character. In some 
other implementation such as `CIDFont` 
(http://www.adobe.com/products/postscript/pdfs/cid.pdf) is not since they are 
meant to cover a wider range then 2^16 characters. 
    
    `mapCodePoint(int)`/`hasCodePoint(int)`: I have added these 2 methods to 
the `CIDFont` class that uses int (code points) instead of char so that we can 
cover the full Unicode range. As you can see from the `Typeface` hierarchy this 
change affect only 2 classes.
    
    `getUnicode()`: is defined in `CIDSet` (is not a property of the `Typeface` 
class or one of its subclasses). I changed the firm of this method to handle 
int instead of char because it is semantically incorrect to represent unicode 
with a single UTF-16 char. As you can see from the `CIDSet` hierarchy the 
change affect only 3 classes.
    
    `getUnicodeFromGID()`: this method is defined in `CustomFont` and `CIDSet`. 
It never get called from the `MultiByteFont` path, probably becuase getUnicode 
is used instead. That is why I'm down casting the return value from int to char 
in `CIDFull` and `CIDSubset`. Probably the best thing to do would be to get rid 
of this method or make it handle int, but again the change would affect more 
classes then the ones in our scope.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/monejava/fop surrogate_pairs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/fop/pull/3.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3
    
----
commit 111d6a6fa58c313293e9b79e245c8521778de2c8
Author: Rondelli <[email protected]>
Date:   2016-09-19T15:13:09Z

    FOP-1969: Surrogate pairs not treated as single unicode codepoint for 
display purposes

----


> Surrogate pairs not treated as single unicode codepoint for display purposes
> ----------------------------------------------------------------------------
>
>                 Key: FOP-1969
>                 URL: https://issues.apache.org/jira/browse/FOP-1969
>             Project: FOP
>          Issue Type: Improvement
>          Components: unqualified
>    Affects Versions: trunk
>         Environment: Operating System: All
> Platform: All
>            Reporter: Glenn Adams
>         Attachments: testing.fo, testing.fo, testing.pdf, testing.pdf, 
> testing.xml, testing.xsl
>
>
> unicode codepoints outside of the BMP (base multilingual plane), i.e., whose 
> scalar value is greater than 0xFFFF (65535), are coded as UTF-16 surrogate 
> pairs in Java strings, which pair should be treated as a single codepoint for 
> the purpose of mapping to a glyph in a font (that supports extra-BMP 
> mappings);
> at present, FOP does not correctly handle this case in simple (non complex 
> script) rendering paths;
> furthermore, though some support has been added to handle this in the complex 
> script rendering path, it has not yet been tested, so is not necessarily 
> working there either;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FOP-1969) Surrogate pairs not treated as single unicode codepoint for display purposes

Reply via email to