[
https://issues.apache.org/jira/browse/TIKA-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523790#comment-17523790
]
Tim Allison commented on TIKA-3718:
-----------------------------------
For posterity, and anyone else facing this kind of issue in the future, my
response to [~DavidAvant]'s question on the PDFBox issue about having to defend
against resource DoS:
bq. An answer on the Tika side. Yes, parsing is dangerous and you’ll need to
isolate at the process level; thread level isolation is not enough. See what we
offer in Tika for robustness:
https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=148647830#content/view/148647830
> Special PDF document causes Tika parser to hang
> -----------------------------------------------
>
> Key: TIKA-3718
> URL: https://issues.apache.org/jira/browse/TIKA-3718
> Project: Tika
> Issue Type: Bug
> Components: app
> Affects Versions: 1.28.1, 2.3.0
> Environment: The problem can be reproduced under (Windows + Java8).
> However, the problem does not appear to be environment specific.
> Reporter: David Avant
> Priority: Major
> Attachments: map.pdf
>
>
> Attempting to parse the attached "map.pdf" causes the Tika parser to hang due
> to an infinite loop involving "PDFStreamParser" logic.
> This problem occurs in both tika-app 1.28.1 and 2.3.0.
> It is also worth noting that Acrobat itself will become unresponsive if
> attempting to open this document.
> To reproduce the problem, just run:
> java -jar tika-app-1.28.1.jar map.pdf
--
This message was sent by Atlassian Jira
(v8.20.1#820001)