[
https://issues.apache.org/jira/browse/TIKA-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686957#comment-17686957
]
Julio J. Gomez Diaz commented on TIKA-3965:
-------------------------------------------
Thank you very much [~tallison] . You clarified the topic. Given this, the
reported issue is not a bug, I'll close it.
Thanks again for your insights in this matter.
Best regards,
> Detector for valid PDF files
> ----------------------------
>
> Key: TIKA-3965
> URL: https://issues.apache.org/jira/browse/TIKA-3965
> Project: Tika
> Issue Type: Bug
> Components: tika-core
> Affects Versions: 2.6.0
> Reporter: Julio J. Gomez Diaz
> Priority: Minor
> Attachments: test2.pdf
>
>
> If we use MagicDetector or the detector using the content via DefaultDetector
> it identifies as PDF file an invalid file such as the attached one, with this
> simple content:
>
> {code:java}
> <script>alert(1)</script>
> %PDF-1.7{code}
>
> Is there any alternative detector in Tika that reads the whole file content
> in order to not detected as PDF a non-valid PDF file?
> If there is not, will it make sense to implement it? Which would be the right
> java package location for this?
>
> This sample file is detected as wrong by Adobe Reader and any online PDF
> processor we found online, but Tika identified it as PDF.
>
> Thanks in advance
--
This message was sent by Atlassian Jira
(v8.20.10#820010)