[ 
https://issues.apache.org/jira/browse/PDFBOX-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841146#comment-13841146
 ] 

Maruan Sahyoun commented on PDFBOX-1796:
----------------------------------------

Hi Manfred,

I added a fix to the fix :-) With that applied the parser no longer has an 
infinite loop using your files. The file dls.pdf is handled fine e.g. 
extracting text. rsag.pdf has some more issues. As there is a newer parser 
which is more inline with the PDF spec you should use 

PDDocument.loadNonSeq() instead of PDDocument.load()

If you’re fine can we close the issue as I think the described one is solved.

BR and thanks for your report.
Maruan

> Infiniteloop BaseParser.java:1010
> ---------------------------------
>
>                 Key: PDFBOX-1796
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1796
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.3
>            Reporter: Manfred Schauer
>         Attachments: dls.pdf, rsag.pdf
>
>
> infinite loop at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1010)
> private final COSString parseCOSHexString() throws IOException
> {
> ...
>             // read till the closing bracket was found
>             do 
>             {
>                 c = pdfSource.read();
>             } while ( c != '>' );
> ...   
> if pdfSource.read() returns EOF, the loop never terminates;
> Testcase:
> PDDocument doc = PDDocument.load (new FileInputStream("..."));
> 2 real world pdf-files that cause the loop could be attached; do not know if 
> their PDF is completely valid, but at least they are displayed via Preview in 
> MacOSX.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to