[ 
https://issues.apache.org/jira/browse/PDFBOX-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222990#comment-17222990
 ] 

Nicolas M commented on PDFBOX-5006:
-----------------------------------

Thanks for your detailed response. I know that PDF viewers are very lax and 
that is not the goal of PDFBox but it's not easy for me to know if it's 
something you want to fix or not (as you already fix some broken PDFs)...

Anyway, if I open the PDF in Chrome and save it, PDFBox can open it without 
problems but not if I wget it. I don't know how you tried but I think Chrome 
viewer fix it during saving...

> java.io.IOException: Error: End-of-File, expected line during PDDocument.load
> -----------------------------------------------------------------------------
>
>                 Key: PDFBOX-5006
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5006
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.20, 2.0.21
>         Environment: Debian, MacOs, open JDK 12
>            Reporter: Nicolas M
>            Priority: Major
>
> I got an I/O Exception when I try to open some PDF using the lib (calling 
> PDDocument.load(pdfFile)). Here are some urls with affected PDF (I think it's 
> the same problem for all of them) :
>  * 
> [https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf]
>  * 
> [http://www.geislerfarms.com/documents/filelibrary/Geisler_COVID_statement_0A7A094E1EFB7.pdf]
>  * 
> [http://www.sahealth.sa.gov.au/wps/wcm/connect/c736e1d5-932e-4f8a-8e56-52ab10a214fd/SALHN+Governing+Board+Minutes+-+5+March+2020.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-c736e1d5-932e-4f8a-8e56-52ab10a214fd-niR9I3J]
> I think the files are not well formatted and doesn't respect PDF specs but I 
> can open them using other pdf viewer (like chrome pdf viewer for example)
>  
> Here is the stack trace : 
> {code:java}
> java.io.IOException: Error: End-of-File, expected linejava.io.IOException: 
> Error: End-of-File, expected line at 
> org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1098) at 
> org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2581) at 
> org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1041) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:989)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to