[ 
https://issues.apache.org/jira/browse/PDFBOX-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223100#comment-17223100
 ] 

Tilman Hausherr commented on PDFBOX-5006:
-----------------------------------------

The original stack trace usually means an empty file.

I can reproduce this with PDFDebugger by loading the URL. Maybe it's the 
redirections. What works is following them (I used Xenu's Link Sleuth to see 
where they go), downloading from
https://www.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf
instead of 
https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf

> java.io.IOException: Error: End-of-File, expected line during PDDocument.load
> -----------------------------------------------------------------------------
>
>                 Key: PDFBOX-5006
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5006
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.20, 2.0.21
>         Environment: Debian, MacOs, open JDK 12
>            Reporter: Nicolas M
>            Priority: Major
>         Attachments: Rehbein_Schule_Hanau_9_2018.pdf, 
> Rehbein_Schule_Hanau_9_2018.txt
>
>
> I got an I/O Exception when I try to open some PDF using the lib (calling 
> PDDocument.load(pdfFile)). Here are some urls with affected PDF (I think it's 
> the same problem for all of them) :
>  * 
> [https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf]
>  * 
> [http://www.geislerfarms.com/documents/filelibrary/Geisler_COVID_statement_0A7A094E1EFB7.pdf]
>  * 
> [http://www.sahealth.sa.gov.au/wps/wcm/connect/c736e1d5-932e-4f8a-8e56-52ab10a214fd/SALHN+Governing+Board+Minutes+-+5+March+2020.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-c736e1d5-932e-4f8a-8e56-52ab10a214fd-niR9I3J]
> I think the files are not well formatted and doesn't respect PDF specs but I 
> can open them using other pdf viewer (like chrome pdf viewer for example)
>  
> Here is the stack trace : 
> {code:java}
> java.io.IOException: Error: End-of-File, expected linejava.io.IOException: 
> Error: End-of-File, expected line at 
> org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1098) at 
> org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2581) at 
> org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1041) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:989)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to