[ 
https://issues.apache.org/jira/browse/TIKA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed TIKA-4047.
---------------------------------
    Resolution: Not A Bug

> Various PDF Parsing errors
> --------------------------
>
>                 Key: TIKA-4047
>                 URL: https://issues.apache.org/jira/browse/TIKA-4047
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.4.1
>         Environment: Windows 11, using Tika server /tika/body API.
>            Reporter: Carey Halton
>            Priority: Minor
>         Attachments: ML100500495 error.txt, ML100500495.PDF, ML100840685 
> error.txt, ML100840685.pdf, ML22020A080 error.txt, ML22020A080.pdf
>
>
> We are seeing various PDF parser errors for a few specific PDF files with 
> Tika 2.4.1. We were hoping that someone could help us investigate and see if 
> there are bugs with the PDF parser or PDFBox that could be fixed to allow 
> these to be parsed (or let us know if they are already fixed in a later 
> version), or if there is just something corrupted about these particular 
> files that makes parsing them impossible. I have attached the 3 files as well 
> as txt files that include the exception message we are seeing for each of 
> them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to