[
https://issues.apache.org/jira/browse/TIKA-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780023#comment-17780023
]
Tim Allison commented on TIKA-2689:
-----------------------------------
Apologies for my delay, [~joshm]. Are you parsing the PDFs and getting
{{application/pdf}} or are you running detect on them?
If you're parsing and able to share the files even if privately, I'd be
grateful to take a look.
If you're running detect... unfortunately, to figure out if the file is AI, we
have to do a full parse. We didn't want to add parsing to the detect step.
Theoretically, we could add a "detector" that parses PDFs to do that detection,
but that would be costly computationally. For some use cases, the extra cost
just doesn't matter.
> *.ai type (Adobe illustrator ) files are not detected correctly.
> ----------------------------------------------------------------
>
> Key: TIKA-2689
> URL: https://issues.apache.org/jira/browse/TIKA-2689
> Project: Tika
> Issue Type: Bug
> Components: core
> Affects Versions: 1.16, 1.17, 1.18
> Reporter: Amit Pandey
> Priority: Major
> Fix For: 2.8.0
>
> Attachments: example.ai, screenshot-1.png
>
>
> There is in-consistency in detecting **ai* types files when using different
> overloaded detect method. When I am using _detect(String filename)_, it gives
> correct file type - "*application/illustrator*". If I use _detect(InputStream
> is, String filename)_ or _detect(File fileObj)_ - it gives file type
> "*application/pdf*".
> Here is sample code I used.
>
> [https://stackoverflow.com/questions/51359351/tika-detect-method-not-giving-same-exact-file-type|http://example.com/]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)