[
https://issues.apache.org/jira/browse/PDFBOX-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681983#comment-13681983
]
Eric Leleu commented on PDFBOX-1639:
------------------------------------
Hi,
I'm talking about the "object number" the first element in an Object Ref before
the "generation number".
Here is the snippet :
3793681698 0 obj
<<
/Type /DPart
/Parent 805 0 R
/DParts ...
>>
endobj
The "object number" 3793681698 is a long not an integer.
I have fixed this issue using a "readLong" method to read the "object number"
and the "generation number" in each parser.
It works fine and because the COSObjectKey already uses a long for these 2
attributes this shouldn't introduce regression.
If you are agreed with me I can commit this fix today.
BR,
Eric
> Infinite loop with PDFParser used by tika.
> ------------------------------------------
>
> Key: PDFBOX-1639
> URL: https://issues.apache.org/jira/browse/PDFBOX-1639
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.7.1, 1.8.2, 2.0.0
> Reporter: Eric Leleu
> Assignee: Eric Leleu
>
> Hi,
> I encountered an issue in a production environment that cause a disk full
> error. :(
> Tika uses the PDFParser with the "forceParsing" boolean set to true in order
> to continue the parsing even if an error occurs.
> Two PDFs have an object number greater than the max int value so the
> readInt() method fails.
> Due to the "forceParsing" boolean, the parser try to go to the next object
> but it can't because on error the readInt method backtrack the read bytes and
> so
> the "skipToNextObj" method does nothing and we try to parse the same object
> indefinitely...
> The COSObjectKey object already uses a long as object numder, so we should
> read a long instead of an integer during the parsing process using a
> "readLong" method to manage too large objects numbers.
> Are you agreed with that ?
> BR,
> Eric
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira