FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters

Lawrence Thibodeaux Mon, 28 Oct 2019 10:18:04 -0700

Hello,

We use FOP 2.3 to generate PDFs based on HTML, and in some very rare cases
we have found that the resulting PDF appears to be truncated and will not
open in any PDF viewer. The aspects of the HTML that cause the problem are
truly mysterious, and I will appreciate help determining what makes this
particular HTML cause problems.


We detected the issue because we use Lowagie PdfReader to validate that the
PDF we generate is well-formed. The PdfReader threw the following Exception:

com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed: trailer
not found.; Original message: PDF startxref not found.

In researching this exception, I have found that in all cases the user is
able to fix their issue by ensuring that the input and output streams are
closed or flushed properly -- in our case, we are using the Java
try-with-resources pattern to invoke close() automatically, so I don't
believe this is our issue.

The majority of the characters rendered in the PDF are Mathematical
Double-Struck characters (e.g. https://www.compart.com/en/unicode/U+1D538),
but not exclusively -- many are normal Latin alphabet characters. The
problem seems to be linked to the quantity of characters rather than
particular characters, because I've been able to fix the problem by
deleting enough characters, adding previously deleted characters and
deleting others. In fact, sometimes adding more Latin characters allows it
to render the PDF. Because we are able to render the PDF in some cases, I
believe we have the fonts necessary to render the Mathematical characters.

I understand there are many factors in play, so I've tried to provide only
the relevant information -- please let me know if there are other facts
that would be helpful in determining the issue. I have omitted the HTML
because it is over 200 lines long -- but I'm willing to provide it if you
desire to look at it.

Thanks,
Lawrence

FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters

Reply via email to