[
https://issues.apache.org/jira/browse/PDFBOX-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733319#comment-13733319
]
Tilman Hausherr commented on PDFBOX-1668:
-----------------------------------------
No, it *is* the same part of the code, at line 1004.
Before looking at the patch I tried checking for EOF without throwing an
exception and it worked. I then tried Christians patch and it stopped (not a
surprise). Considering that Andreas wrote "We try to make PDFBox being lenient
towards malformed pdfs", I suggest something like
// read till the closing bracket was found
do
{
if (pdfSource.isEOF())
{
LOG.warn("parseCOSHexString(): Premature EOF");
break;
}
c = pdfSource.read();
} while ( c != '>' );
The third page of Christians file cannot be shown in Acrobat reader, but it is
rendered with PDFBOX, and that is pretty cool IMHO.
> Loading a Russian PDF never finishes
> -------------------------------------
>
> Key: PDFBOX-1668
> URL: https://issues.apache.org/jira/browse/PDFBOX-1668
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Sergio Fernández
> Priority: Minor
>
> Try to run this line:
> PDDocument.load(new
> URL("http://www.who.int/entity/foodsafety/publications/general/en/global_strategy_ru.pdf"));
> The loading never finishes... taking a lot of CPU.
> The document size (574K) should not be the problem. I guess something in that
> document causes the issue with PdfBox. And I'd like to know if such could be
> a more general issue or what.
> Thanks!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira