Simone Rondelli commented on FOP-1969:

Hi FOP Users, 

I am working on a project that uses Apache FOP and, as part of that project, 
need to fix FOP-1969 [1], which has to do with supplementary character support 
(surrogate pairs). I have obtained approval to contribute these changes back to 
the community. I want to run my design past the list (and especially Glenn 
Adams) and ask a few questions before proceeding: 

# Read the CMAP from {{OpenFont.readCMAP()}} implementing the case: {{cmapPID 
== 3 && cmapEID == 10}} and {{cmapFormat == 12}}. This way I could fill 
correctly the {{unicodeMappings}} List.
#  Fix the class {{GLyphMapping}} to support non-BMP code points (there are 
already some TODO in the class for the support of the non-BMP code points)
# The class {{GLyphMapping}} uses the {{org.apache.fop.fonts.Font}} class 
methods like {{Font.hasChar(char c)}}, {{Font.getCharWidth(char c)}}, 
{{Font.mapChar(char c)}} etc.. since they accept a single char and a surrogate 
pair is composed by two chars I will need to modify the {{Font}} class as well. 
I think that I should either:
## add overloaded methods that accept int so that we can pass the code points. 
An alternative is to create a different set of method with the Codepoint 
suffix:  {{Font.hasCodepoint(int cp)}}, {{Font.getCodePointWidth(int cp)}}, 
{{Font.mapCodepoint(int cp)}} etc...
## Change the methods firm to accept/return int
# The class {{Font}} uses the interface {{Typeface}} that has the same problem: 
methods that accept char. We should either change this interface or one of its 
subclasses like {{MultiByteFont}} or {{CIDFont}} (which denote font with a 
large set of code points.

So far my research stopped at this point and before to proceed I would like 
some feedback to know wether I'm taking a good direction and If I'm missing 

(I sent the same message to the mailing list, I'm posting here to make clear 
that somebody is willing to work on it)

Simone Rondelli

> Surrogate pairs not treated as single unicode codepoint for display purposes
> ----------------------------------------------------------------------------
>                 Key: FOP-1969
>                 URL: https://issues.apache.org/jira/browse/FOP-1969
>             Project: FOP
>          Issue Type: Improvement
>          Components: unqualified
>    Affects Versions: trunk
>         Environment: Operating System: All
> Platform: All
>            Reporter: Glenn Adams
>         Attachments: testing.fo, testing.fo, testing.pdf, testing.pdf, 
> testing.xml, testing.xsl
> unicode codepoints outside of the BMP (base multilingual plane), i.e., whose 
> scalar value is greater than 0xFFFF (65535), are coded as UTF-16 surrogate 
> pairs in Java strings, which pair should be treated as a single codepoint for 
> the purpose of mapping to a glyph in a font (that supports extra-BMP 
> mappings);
> at present, FOP does not correctly handle this case in simple (non complex 
> script) rendering paths;
> furthermore, though some support has been added to handle this in the complex 
> script rendering path, it has not yet been tested, so is not necessarily 
> working there either;

This message was sent by Atlassian JIRA

Reply via email to