[
https://issues.apache.org/jira/browse/FOP-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602625#comment-15602625
]
Simone Rondelli commented on FOP-1969:
--------------------------------------
The project is already using org.apache.pdfbox:fontbox thus I thought that
using org.apache.pdfbox:pdfbox as test dependency would not be a big deal.
And yes I did not consider ANT build, but I can easily add the jar.
PDDocument do not contain a load() method. The current implementation of
extractTextFromPDF() works as expected.
> Surrogate pairs not treated as single unicode codepoint for display purposes
> ----------------------------------------------------------------------------
>
> Key: FOP-1969
> URL: https://issues.apache.org/jira/browse/FOP-1969
> Project: FOP
> Issue Type: Improvement
> Components: unqualified
> Affects Versions: trunk
> Environment: Operating System: All
> Platform: All
> Reporter: Glenn Adams
> Attachments: Urdu.zip, pcltest.zip, single-byte.zip, testing.fo,
> testing.fo, testing.pdf, testing.pdf, testing.xml, testing.xsl, tiffttc.zip
>
>
> unicode codepoints outside of the BMP (base multilingual plane), i.e., whose
> scalar value is greater than 0xFFFF (65535), are coded as UTF-16 surrogate
> pairs in Java strings, which pair should be treated as a single codepoint for
> the purpose of mapping to a glyph in a font (that supports extra-BMP
> mappings);
> at present, FOP does not correctly handle this case in simple (non complex
> script) rendering paths;
> furthermore, though some support has been added to handle this in the complex
> script rendering path, it has not yet been tested, so is not necessarily
> working there either;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)