[ https://issues.apache.org/jira/browse/PDFBOX-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223095#comment-17223095 ]
Nicolas M commented on PDFBOX-5006: ----------------------------------- Ok, I think I got it. It appears that I use those two lines to download the document: {code:java} File file = File.createTempFile(Long.toString(System.currentTimeMillis()), ".pdf"); FileUtils.copyURLToFile(new URL(path), file); {code} and the pdf is not correctly downloaded... Using wget and then a local path works... I'm so sorry for the waste of time... We can close... > java.io.IOException: Error: End-of-File, expected line during PDDocument.load > ----------------------------------------------------------------------------- > > Key: PDFBOX-5006 > URL: https://issues.apache.org/jira/browse/PDFBOX-5006 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.20, 2.0.21 > Environment: Debian, MacOs, open JDK 12 > Reporter: Nicolas M > Priority: Major > Attachments: Rehbein_Schule_Hanau_9_2018.pdf, > Rehbein_Schule_Hanau_9_2018.txt > > > I got an I/O Exception when I try to open some PDF using the lib (calling > PDDocument.load(pdfFile)). Here are some urls with affected PDF (I think it's > the same problem for all of them) : > * > [https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf] > * > [http://www.geislerfarms.com/documents/filelibrary/Geisler_COVID_statement_0A7A094E1EFB7.pdf] > * > [http://www.sahealth.sa.gov.au/wps/wcm/connect/c736e1d5-932e-4f8a-8e56-52ab10a214fd/SALHN+Governing+Board+Minutes+-+5+March+2020.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-c736e1d5-932e-4f8a-8e56-52ab10a214fd-niR9I3J] > I think the files are not well formatted and doesn't respect PDF specs but I > can open them using other pdf viewer (like chrome pdf viewer for example) > > Here is the stack trace : > {code:java} > java.io.IOException: Error: End-of-File, expected linejava.io.IOException: > Error: End-of-File, expected line at > org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1098) at > org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2581) at > org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560) at > org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099) at > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082) at > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1041) at > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:989) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org