[
https://issues.apache.org/jira/browse/TIKA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723993#comment-17723993
]
Tilman Hausherr commented on TIKA-4047:
---------------------------------------
The current tika version is 2.8.0 . I only got an error with the first file
when using tika-app. The file is corrupt, try to display page 68 with Adobe
Reader.
> Various PDF Parsing errors
> --------------------------
>
> Key: TIKA-4047
> URL: https://issues.apache.org/jira/browse/TIKA-4047
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.4.1
> Environment: Windows 11, using Tika server /tika/body API.
> Reporter: Carey Halton
> Priority: Minor
> Attachments: ML100500495 error.txt, ML100500495.PDF, ML100840685
> error.txt, ML100840685.pdf, ML22020A080 error.txt, ML22020A080.pdf
>
>
> We are seeing various PDF parser errors for a few specific PDF files with
> Tika 2.4.1. We were hoping that someone could help us investigate and see if
> there are bugs with the PDF parser or PDFBox that could be fixed to allow
> these to be parsed (or let us know if they are already fixed in a later
> version), or if there is just something corrupted about these particular
> files that makes parsing them impossible. I have attached the 3 files as well
> as txt files that include the exception message we are seeing for each of
> them.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)