[ 
https://issues.apache.org/jira/browse/PDFBOX-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683181#comment-13683181
 ] 

Timo Boehme commented on PDFBOX-1639:
-------------------------------------

Hi,

thanks for the patch. A view comments:
- object number must be positive (negative numbers should be counted as error; 
PDF Spec 1.7 chap. 3.2.9)
- I would prefer throwing an IOException instead of IllegalArgumentException 
for following reasons:
  - IAE is a RuntimeException thus typically it won't be catched by PDFBox thus 
parsing breaks here without possibility of recover and even user code might not 
be prepared for it
  - if throwing IAE was to prevent the looping than maybe the recovery 
procedure has to be changed?
  - readInt/readLong used in the methods will already throw IOE if it is a 
malformed number and having a too large/small number is also a malformed number
- we should use StringBuilder instead of StringBuffer in all cases (as long as 
only a single thread accesses the object)

BR
Timo
                
> Infinite loop with PDFParser used by tika.
> ------------------------------------------
>
>                 Key: PDFBOX-1639
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1639
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.1, 1.8.2, 2.0.0
>            Reporter: Eric Leleu
>            Assignee: Eric Leleu
>         Attachments: PDFBox-1639.patch
>
>
> Hi,
> I encountered an issue in a production environment that cause a disk full 
> error. :(
> Tika uses the PDFParser with the "forceParsing" boolean set to true in order 
> to continue the parsing even if an error occurs.
> Two PDFs have an object number greater than the max int value so the 
> readInt() method fails.
> Due to the "forceParsing" boolean, the parser try to go to the next object 
> but it can't because on error the readInt method backtrack the read bytes and 
> so 
> the "skipToNextObj" method does nothing and we try to parse the same object 
> indefinitely...
> The COSObjectKey object already uses a long as object numder, so we should 
> read a long instead of an integer during the parsing process using a 
> "readLong" method to manage too large objects numbers.
> Are you agreed with that ?
> BR,
> Eric

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to