[
https://issues.apache.org/jira/browse/PDFBOX-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934353#action_12934353
]
Martijn Brinkers commented on PDFBOX-818:
-----------------------------------------
I have tried to replicate the problem with the PDF from the link but I can
extract text from the PDF without any problems.
> PDFParser fails if object/xref starts at same line as endobj of a stream
> object
> -------------------------------------------------------------------------------
>
> Key: PDFBOX-818
> URL: https://issues.apache.org/jira/browse/PDFBOX-818
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.3.1
> Reporter: Timo Boehme
> Attachments: pdfbox_issue818.patch
>
>
> If an object or xref starts at same line after the 'endobj' token and the
> closed object contains a stream, parsing of next object fails.
> Example:
> endstream
> endobj xref
> 0 26
> In PDFParser if an object contains a stream the 'endobj' token is read via
> readLine(). Thus the line break is consumed as well. Now the 'endobj' with
> following command is handled but only 'xref' is pushed back and not the line
> break which results in 'xref0' when trying to read next pbject. Thus in this
> case a simple solution is to push back a space byte before the 'xref'.
> I will add a patch for it.
> Part of the problem can be seen in PDF from
> http://onlinelibrary.wiley.com/doi/10.1111/j.1399-6576.2009.02134.x/pdf at
> last 'endobj'. However the last object does not contain a stream and I was
> not able to produce such a PDF (the PDFs I have containing described
> problematic construct are unfortunately confidential).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.