[ 
https://issues.apache.org/jira/browse/PDFBOX-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223049#comment-17223049
 ] 

Nicolas M commented on PDFBOX-5006:
-----------------------------------

I tried the first URL : 

*Rehbein_Schule_Hanau_9_2018*
{code:java}
nicolas@MacBook-Pro-de-Nicolas-Pro ~ % wget 
https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf
--2020-10-29 17:59:32--  
https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf
Résolution de www.buerger.uni-frankfurt.de (www.buerger.uni-frankfurt.de)… 
141.2.37.41
Connexion à www.buerger.uni-frankfurt.de 
(www.buerger.uni-frankfurt.de)|141.2.37.41|:443… connecté.
requête HTTP transmise, en attente de la réponse… 301 Moved Permanently
Emplacement : 
http://www.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf? [suivant]
--2020-10-29 17:59:32--  
http://www.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf?
Résolution de www.uni-frankfurt.de (www.uni-frankfurt.de)… 141.2.37.41
Connexion à www.uni-frankfurt.de (www.uni-frankfurt.de)|141.2.37.41|:80… 
connecté.
requête HTTP transmise, en attente de la réponse… 302 Found
Emplacement : 
https://www.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf? [suivant]
--2020-10-29 17:59:32--  
https://www.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf?
Connexion à www.uni-frankfurt.de (www.uni-frankfurt.de)|141.2.37.41|:443… 
connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 2479934 (2,4M) [application/pdf]
Sauvegarde en : « Rehbein_Schule_Hanau_9_2018.pdf »


Rehbein_Schule_Hanau_9_2018.pdf               
100%[==============================================================================================>]
   2,36M  8,10MB/s    ds 0,3s    


2020-10-29 17:59:32 (8,10 MB/s) — « Rehbein_Schule_Hanau_9_2018.pdf » 
sauvegardé [2479934/2479934]
{code}
I attach the file I got (that I can't open with 
PDFBox)[^Rehbein_Schule_Hanau_9_2018.pdf]

> java.io.IOException: Error: End-of-File, expected line during PDDocument.load
> -----------------------------------------------------------------------------
>
>                 Key: PDFBOX-5006
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5006
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.20, 2.0.21
>         Environment: Debian, MacOs, open JDK 12
>            Reporter: Nicolas M
>            Priority: Major
>         Attachments: Rehbein_Schule_Hanau_9_2018.pdf
>
>
> I got an I/O Exception when I try to open some PDF using the lib (calling 
> PDDocument.load(pdfFile)). Here are some urls with affected PDF (I think it's 
> the same problem for all of them) :
>  * 
> [https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf]
>  * 
> [http://www.geislerfarms.com/documents/filelibrary/Geisler_COVID_statement_0A7A094E1EFB7.pdf]
>  * 
> [http://www.sahealth.sa.gov.au/wps/wcm/connect/c736e1d5-932e-4f8a-8e56-52ab10a214fd/SALHN+Governing+Board+Minutes+-+5+March+2020.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-c736e1d5-932e-4f8a-8e56-52ab10a214fd-niR9I3J]
> I think the files are not well formatted and doesn't respect PDF specs but I 
> can open them using other pdf viewer (like chrome pdf viewer for example)
>  
> Here is the stack trace : 
> {code:java}
> java.io.IOException: Error: End-of-File, expected linejava.io.IOException: 
> Error: End-of-File, expected line at 
> org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1098) at 
> org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2581) at 
> org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1041) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:989)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to