[
https://issues.apache.org/jira/browse/TIKA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed TIKA-4047.
---------------------------------
Resolution: Not A Bug
> Various PDF Parsing errors
> --------------------------
>
> Key: TIKA-4047
> URL: https://issues.apache.org/jira/browse/TIKA-4047
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.4.1
> Environment: Windows 11, using Tika server /tika/body API.
> Reporter: Carey Halton
> Priority: Minor
> Attachments: ML100500495 error.txt, ML100500495.PDF, ML100840685
> error.txt, ML100840685.pdf, ML22020A080 error.txt, ML22020A080.pdf
>
>
> We are seeing various PDF parser errors for a few specific PDF files with
> Tika 2.4.1. We were hoping that someone could help us investigate and see if
> there are bugs with the PDF parser or PDFBox that could be fixed to allow
> these to be parsed (or let us know if they are already fixed in a later
> version), or if there is just something corrupted about these particular
> files that makes parsing them impossible. I have attached the 3 files as well
> as txt files that include the exception message we are seeing for each of
> them.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)