[
https://issues.apache.org/jira/browse/PDFBOX-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009197#comment-13009197
]
Adam Nichols commented on PDFBOX-978:
-------------------------------------
Fixed in revision 1083858. Thanks again.
> unreading of trailing content after 'endobj' is missing new line byte (fix
> included)
> ------------------------------------------------------------------------------------
>
> Key: PDFBOX-978
> URL: https://issues.apache.org/jira/browse/PDFBOX-978
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.6.0
> Reporter: Timo Boehme
> Assignee: Adam Nichols
> Fix For: 1.6.0
>
> Original Estimate: 5m
> Remaining Estimate: 5m
>
> I have several journal PDFs where the last xref section starts like
> endobj xref
> 0 92
> 0000000000 65535 f
> 0000000044 00000 n
> in this cases the PDF parser reads the endobj line completely and unreads "
> xref".
> However the newline (in this case ^D) is lost. This is already documented in
> the
> method readline() within PDFParser:
> "Note: if you later unread the results of this function, you'll
> need to add a newline character to the end of the string."
> Currently I get an error like: "expected='obj' actual='655'" because the
> 'xref' is read as 'xref0'.
> The fix:
> in PDFParser insert before line 579 (the unreading of trailing characters
> after 'endobj') the lines:
> // add a space first in place of the newline consumed by readline()
> pdfSource.unread( SPACE_BYTE );
> thus we get:
> if (endObjectKey.startsWith( "endobj" ) )
> {
> /*
> * Some PDF files don't contain a new line after endobj
> so we
> * need to make sure that the next object number is
> getting read separately
> * and not part of the endobj keyword. Ex. Some files
> would have "endobj28"
> * instead of "endobj"
> */
> // add a space first in place of the newline consumed by
> readline()
> pdfSource.unread( SPACE_BYTE );
> pdfSource.unread( endObjectKey.substring( 6
> ).getBytes("ISO-8859-1") );
> }
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira