[jira] [Created] (FOP-2886) FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters

Lawrence Thibodeaux (Jira) Tue, 29 Oct 2019 17:52:35 -0700

Lawrence Thibodeaux created FOP-2886:
----------------------------------------


             Summary: FOP 2.3 Generates Truncated/Corrupted PDF with 
Mathematical Unicode Characters
                 Key: FOP-2886
                 URL: https://issues.apache.org/jira/browse/FOP-2886
             Project: FOP
          Issue Type: Bug
          Components: renderer/pdf
    Affects Versions: 2.3
         Environment: Reproduced on Ubuntu 18.04.3 LTS
            Reporter: Lawrence Thibodeaux
         Attachments: ActualResult.pdf, ApproximateExpectedResult.pdf, 
fo_setup.xsl, name2fo.xsl, reproHtml.txt, xhtml2fo.xsl

*Overview:*
{quote}We use FOP 2.3 to generate PDFs based on HTML. We have found that the 
inclusion of a large number of certain Mathematical Unicode characters (such as 
[https://www.compart.com/en/unicode/U+1D538] ) allows the PDF to be created 
without error, but the PDF generated cannot be opened by any PDF viewer.{quote}
{quote}We also use Lowagie PdfReader to validate that the PDF we generate is 
well-formed. The PdfReader threw the following Exception:
 com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed: trailer not 
found.; Original message: PDF startxref not found.{quote}
{quote}Manual inspection has revealed that the trailer has indeed not been 
included. We've seen this issue can occur when the input and output streams are 
not closed or flushed properly -- in our case, we are using the Java 
try-with-resources pattern to invoke close() automatically, so I don't believe 
this is our issue. I have also tried in vain closing our streams manually, as 
well as switching the order in which the close() happens.{quote}
*Steps to Reproduce:*
{quote}I have not been able to reproduce outside of our software, 
unfortunately, but I've included the HTML that causes the problem 
(reproHtml.txt) and the .xsl files we use. This is the code snippet that we use 
to convert the input HTML into a ByteArrayOutputStream:{quote}
{quote}
{code:java}
public void generatePdfWithCssToXslFo(
        final String htmlString,
        final OutputStream outputStream
) throws CSSToXSLFOException, SAXException, IOException {
    try (final Reader htmlReader = new StringReader(htmlString)) {
        final InputSource source = new InputSource(htmlReader);
        final boolean isValidatingParser = false;
        final boolean cssToXslFoDebugEnabled = 
System.getProperty("be.re.css.debug") != null;

        // Setup FOP to take the xml:fo and turn it into a PDF
        final Fop fop;
        final FOUserAgent userAgent;

        FopFactoryBuilder builder = new 
FopFactoryBuilder(URI.create(resourceLoader.getResource(resourceBasePath).getURI().toString()),
 new ClasspathResolverURIAdapter());
        builder.setConfiguration(configuration);

        FopFactory factory = builder.build();
        userAgent = factory.newFOUserAgent();
        userAgent.setAuthor("Indeed");
        userAgent.setCreator("Indeed Resume");
        userAgent.setTitle("Indeed Resume");
        userAgent.setKeywords("Indeed Resume");

        fop = factory.newFop(MimeConstants.MIME_PDF, userAgent, outputStream);

        // Setup CSSToXSLFo as transform the XHTML output into xml:fo
        final URL baseUrl = 
resourceLoader.getResource(resourceBasePath).getURL();
        Loggers.debug(LOGGER, "Parsing HTML response using base URL '%s'", 
baseUrl);
        final XMLReader xmlParser = Util.getParser(null, isValidatingParser);
        final ProtectEventHandlerFilter eventHandlerFilter = new 
ProtectEventHandlerFilter(true, true, xmlParser);

        final XMLReader filter =
                new CSSToXSLFOFilter(
                        baseUrl,
                        null,
                        Collections.EMPTY_MAP,
                        eventHandlerFilter,
                        cssToXslFoDebugEnabled);

        filter.setEntityResolver(classPathEntityResolver);
        filter.setContentHandler(fop.getDefaultHandler());
        filter.parse(source);
    }
}{code}
{quote}
*Actual Results:*
{quote}The attached PDF is created (ActualResult.pdf){quote}
*Expected Results:*
{quote}An intact PDF can be created. For example, I've attached 
ApproximateExpectedResult.pdf where I've replaced the first letter with my 
name, which allows the PDF to render.{quote}
*Build Date & Hardware:* Date and hardware of the build in which you first 
encountered the bug.
{quote}FOP version 2.3, Build 2014-07-15 on Ubuntu 18.04.3 LTS{quote}
*Additional Builds and Platforms:* Whether or not the bug takes place on other 
platforms (or browsers, if applicable).
{quote}(Unable to test on other platforms.){quote}
*Additional Information:* 

As you can see in the ApproximateExpectedResult.pdf, there is a mix of these 
Mathematical characters and normal Latin letter characters. Adding additional 
Latin characters or removing any of the Mathematical characters can sometimes 
allow the PDF to render, but it's hard to predict - I was not able to link it 
to any particular character or word.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FOP-2886) FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters

Reply via email to