[ 
https://issues.apache.org/jira/browse/TIKA-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780023#comment-17780023
 ] 

Tim Allison commented on TIKA-2689:
-----------------------------------

Apologies for my delay, [~joshm]. Are you parsing the PDFs and getting 
{{application/pdf}} or are you running detect on them?

If you're parsing and able to share the files even if privately, I'd be 
grateful to take a look.

If you're running detect... unfortunately, to figure out if the file is AI, we 
have to do a full parse. We didn't want to add parsing to the detect step. 
Theoretically, we could add a "detector" that parses PDFs to do that detection, 
but that would be costly computationally. For some use cases, the extra cost 
just doesn't matter.  

> *.ai type (Adobe illustrator ) files are not detected correctly.
> ----------------------------------------------------------------
>
>                 Key: TIKA-2689
>                 URL: https://issues.apache.org/jira/browse/TIKA-2689
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.16, 1.17, 1.18
>            Reporter: Amit Pandey
>            Priority: Major
>             Fix For: 2.8.0
>
>         Attachments: example.ai, screenshot-1.png
>
>
> There is in-consistency in detecting **ai* types files when using different 
> overloaded detect method. When I am using _detect(String filename)_, it gives 
> correct file type - "*application/illustrator*". If I use _detect(InputStream 
> is, String filename)_ or _detect(File fileObj)_ -  it gives file type 
> "*application/pdf*".
> Here is sample code I used.
>   
> [https://stackoverflow.com/questions/51359351/tika-detect-method-not-giving-same-exact-file-type|http://example.com/]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to